I wasn’t going to blog about performance until a later post, but there was a performance issue I wanted to tackle before discussing some other things.
The frame rate sometimes seemed to be as expected, but sometimes slow, and the application was quite unresponsive when handling the Stop button command.
To help nail this down I created a Buffer Frame Rate performance counter, so that I monitor it along with another couple of metrics (I chose to see the count of gen1 and gen2 garbage collections, along with the % time in GC, and % processor time).
The framerate average isn’t bad, but the frame rate is not consistent; it regularly drops below 10 frames per second. This post will discuss optimising the performance of the application to keep the frame rate steady and ensure that the app is responsive.
Performance increases should generally focus on reducing unnecessary work in the whole application, and not micro-optimisations. In this case, it’s realising that the top chart is drawing at 40 frames per second (a buffer of 1000 data points into a 40KHz source), whereas the bottom chart is updating at 400 frames per second (sampling the data to 100 points, and updating the chart on each of those sample points).
However, as there are other types of charts I want to produce it’s worth investigating where the time is spent.
First off, I tried turning off anti-aliasing, and moving over to the d3future project (http://d3future.codeplex.com ), to see if that made any performance difference, but the performance was pretty similar, with the CPU pegged at 100%. A further run generated this interesting trace:
The frame count starts initially high, but after a while performance drops off dramatically, which is followed by some long spikes in the % Time in GC. The drop-off maybe seems to correspond with one of the gen 2 collections. This situation wasn’t so reproducible that I could catch it under a profiler.
My next step was to profile to see where the application was spending its time, just using the Visual Studio profiler. The application’s hot path was in the D3 LineChart’s UpdateCore method, where it was updating the chart’s Bounds using data from the EnumerableDataSource –it was iterating over the whole collection to find the x and y min and max values (to fit the chart to the data area).
It seems unnecessary to iterate over the whole data source to get the minimum and maximum x values – for a linear chart (i.e. not a scatter chart) it would be expected that the first point is at the minimum x value and the last point the maximum.
I created a chart derived from LineChart where I could instead pass in a List of points for the datasource – this means that I could get the last point without enumerating the whole collection, allowing the min and max x values to be quickly found. For the y values, I happen to want the chart to be a fixed axis and not scale to fit the data (instead scaled by a user-configurable gain factor), so there was no need at all to iterate over the collection.
The chart looked like this after those changes:
The frame rate is pretty consistent now. The gen 1 allocations are still occurring at a rate of 200 per minute, but the application is now responsive to the stop click, stopping immediately.
Profiling with memory allocations turned on showed that the types with most instances was System.Byte with 32% of the instances allocated in the profiling period.
ConvertByteArrayToFloatArray() previously looked like this:
And after changing this to not use LINQ:
Gives the following:
Performance is pretty similar to previous, but the gen 1 collections are occurring at a slightly lower rate. Now, profiling tells me that the types with most instances allocated is pretty much split between System.Action, System.Object, and System.Reactive.Disposables.SingleAssignmentDisposable.
Profiling at this point identifies the majority of the work now being in LineGraph.UpdateCore() where its transforming all points from their data coordinates into screen coordinates.
The transformation of coordinates for many points is embarrassingly parallel (think GPGPU). Potentially, the GPU could be put to use by simply drawing the points untransformed, adding a transformation onto the drawing context.
Further performance increases could be done for the sliding window chart: A sliding window chart redisplays the same points again and again, as they move left along the x axis. It essentially transforms the same point many times into the new viewport. Doing a transform to move an existing point to the left could be more efficient than transforming again from data to screen coordinates.
This blog so far has looked at some micro-optimisations focussing on identifying areas where work can be reduced. As I said in the beginning, if I wasn’t so interested in optimising for future requirements, I’d simply ensure that I was doing less drawing straight away.
Anyway, the sliding observable can be change to only take every 10th windowed set of values, by introducing another Sample() on the result of the WindowWithCount():
There isn’t actually that much difference in real-world performance.
The rate of gen 1 collections is lower still than previous, but the profiler shows that app is still spending the majority of its time in drawing the charts. Surprisingly the CPU usage doesn’t seem to have dropped much, although in the profiler the time can now be seen to be split between the drawing of the two charts, instead of dominated by the drawing of the sliding window chart.
The micro-optimisations were still worth investigating, as I have charts in mind that will draw multiple traces simultaneously.