TPL Dataflow

This blog post is coming a couple of weeks late due to my copy of Windows becoming corrupted – and strangely the only things I didn’t have backed up were this Oscilloscope app!

Anyway, last time I took a look at charting performance, but this post will investigate TPL Dataflow.

In my original F# post, I discussed that I felt that the code would have been better implemented in RX, which I then went and did in this post. However, RX isn’t the only recent new technology coming out of Microsoft that deals with streams of data. TPL Dataflow has a confusingly similar overlap with the usages of RX, and this is what its whitepaper has to say:

“Astute readers may notice some similarities between TPL Dataflow and Reactive Extensions (Rx), currently available as a download from the MSDN Data developer center. Rx is predominantly focused on coordination and composition of event streams with a LINQ-based API, providing a rich set of combinators for manipulating IObservable<T>s of data. In contrast, TPL Dataflow is focused on providing building blocks for message passing and parallelizing CPU- and I/O-intensive applications with high-throughput and low-latency, while also providing developers explicit control over how data is buffered and moves about the system. As such, Rx and TPL Dataflow, while potentially viewed as similar at a 30,000 foot level, address distinct needs. Even so, TPL Dataflow and Rx provide a better together story.”

That does sound very interesting – who doesn’t want better performance!


I’ll dive in straight away and look at some code. The code is structured similarly to the RX code in my previous post.


The microphone still provides an IObservable<float[]> via the GetStream() method. The TaskSchedulers are specified explicitly throughout, so that the code can be unit tested (I’m using a MockSynchronizationContext).

The transformManyBlock behaves in the same way to the SelectMany operator, it separates the float array into its constituent parts.

The broadcast block is analogous to a connectable observable – many blocks can be linked to it to receive the same messages.

The transform many block is then explicitly linked to the broadcast block, meaning that its outputs make it into the broadcast block’s input.

Oscilloscope/Buffered View

The code to deal with the buffered display of data is:


The batch block behaves the same as LINQ’s Buffer operator (a bit confusingly when TPL Dataflow’s Buffer means something different). This simply takes the incoming float values and packages them into an array.

The ActionBlock is the final point on the chain – it takes the values it receives and outputs them to the screen (via oscilloscopeOutput) – this is the equivalent code to the code in the Observable Subscription in the RX version of my code.

Sliding Window View

Again, the sliding window view is structured similarly to the previous RX code:


A sample block is created so that every n samples are fed into the SlidingWindowBlock. This returns a new float[] array representing the new window of data on every input.

The second sample block ensures that not every window of data is drawn (so that the chart is refreshed at 40fps rather than 400fps!)

Finally, the observable is subscribed to, and the messages are posted into the transformManyBlock:


Whereas the RX guidelines recommend creating new operators based on existing operators, TPL Dataflow encourages building up custom blocks (though they can be built out of combining existing blocks).

The SlidingWindowBlock is a slightly modified version of the one provided in the white paper, and here ( The one in the example only posts a window once it has received a full window worth of data, whereas I want to start drawing the windows as soon as data arrives.


The SampleBlock is simple, it yields a value once it has received enough inputs, discarding the intermediate ones. This is structured more efficiently than my RX implementation – that one buffers the values, and then only take the last value out of the buffer.



I removed the updating of the chart, to look purely at what the performance of the app is in taking the input from the microphone, and shaping it to be in the correct format for output.

RX Performance:


TPL Dataflow:


The processor usage between the apps is in the same ballpark, but the TPL Dataflow app has many more garbage collections, and # bytes in all heaps also is running slightly higher. I had a quick look in the profiler, and it seems that there are many Task allocations from the Dataflow.Post() operation from within the SlidingWindow and Sample implementations.

The extra generation 1 collections don’t really matter for this application with the functionality that it currently has – the % time in GC barely registers, and there are no Gen 2 garbage collections over the minute or so that the app was running.

Once the app is doing more calculations and drawing more complex charts it would be interesting to see whether any latency spikes due gen 2 GCs cause similar slowdowns to the one discussed at the start of my previous post.

It would be an interesting exercise to limit the amount of GCs throughout the app, for instance there’s no need for the microphone access observable to return an IObservable<float[]> instead of IObservable<float>; currently every read from the microphone allocates a new float[]. Similarly, new List<Point> are created to more easily interface with DynamicDataDisplay – it would be better to change the types of data that D3 takes to be more observable-friendly, and to save having so many allocations. Again, there’s not much point doing this, other than an interesting theoretical exercise, until the garbage collection overhead proves to be a performance issue.


For an application as simple as this, there isn’t any benefit to using TPL Dataflow – it is a powerful library, with functionality such as blocks being able to decline offered blocks, and request them later, which would be difficult to implement in RX. As my app doesn’t (currently) need that level of functionality, there’s no benefit to using the library.

I may revisit this in the future – if I had some computationally expensive operation (FFT for instance) where I’d want greater control over the flow of data through the system.