Monte Carlo double barrier option pricing on GPU using C++ AMP

I converted the double barrier option pricing code from my last post to run on the GPU, using C++ AMP, and got an 80% speedup.

This is running on a low powered Acer V5-122p laptop with an AMD A6-1450 processor with an integrated Radeon HD 8250 GPU.

The gist is here:

To be fair, I converted the CPU code to be otherwise identical to the gpu code. Instead of populating an array of the sample path, it instead for each point just determines whether the value breaches the upper or lower barriers and uses the last value for the payoff.

This reduced the runtime from the code in my last blog post, of 2540ms, to 1250ms, i.e. a 2x speedup.

The GPU code was ran twice, the first time it ran in 140ms, the second run (after all shaders already compiled etc) was 15.6ms, i.e. a very impressive 80x speedup from the CPU code.

If anything, it shows that AMDs strategy of cheap low powered laptop chips will payoff if people start taking advantage of the relatively strong GPU.

2 thoughts on “Monte Carlo double barrier option pricing on GPU using C++ AMP”

  1. It'd be interesting to see what kind of benefits you could get from the new SSE capabilities in managed code that are in the VS 2014 CTP. I doubt it will match the speed up you're getting from the GPU, but still would be interesting.

  2. Hey Niall,
    good question! I didn't fancy porting a parallel random number generator to .net, let alone using SSE, so I don't know… I don't think there's SSE bitshift operations etc.

    It wouldn't be more than 4x speedup on my machine though (it's 128 bit SSE registers, so 4 32bit floats). Including a potential 4x speedup from parallelising across my 4 cores that could be 16x…

    I've been playing with the new .NET SSE stuff and RyuJIT for something else – will blog about it soon 😉

Comments are closed.