I converted the double barrier option pricing code from my last post to run on the GPU, using C++ AMP, and got an 80% speedup.
This is running on a low powered Acer V5-122p laptop with an AMD A6-1450 processor with an integrated Radeon HD 8250 GPU.
The gist is here: https://gist.github.com/taumuon/bbeeb9e2c1f5082a2699
To be fair, I converted the CPU code to be otherwise identical to the gpu code. Instead of populating an array of the sample path, it instead for each point just determines whether the value breaches the upper or lower barriers and uses the last value for the payoff.
This reduced the runtime from the code in my last blog post, of 2540ms, to 1250ms, i.e. a 2x speedup.
The GPU code was ran twice, the first time it ran in 140ms, the second run (after all shaders already compiled etc) was 15.6ms, i.e. a very impressive 80x speedup from the CPU code.
If anything, it shows that AMDs strategy of cheap low powered laptop chips will payoff if people start taking advantage of the relatively strong GPU.
My last blog post was about pricing of exotic options using F#. In that post I said “Apart from the Monte Carlo code, the fact that in F# all functions are in the curried form means that composing the payoff functions using partial function application is beautiful. This would be lambda hell to replicate in C#.”
The OO version of this would probably be built via composition: having a class implementing a IPayoff interface (implemented for Call and Put), and that would be used in implementations of IPathPayof, which would be implemented by AsianArithmeticMean, DoubleBarrier etc. The use of OO is to pull out the common implementation, but there’s no state associated with any of these classes – it feels much nicer using function composition.
I’ve implemented a version in C++ using lambdas, and the functional composition is actually really nice. C++ has the std::bind method which means that any number of parameters can be partially applied even for functions not in the curried form.
The code can be found here: https://gist.github.com/taumuon/727c31f4a56518f91718
The most surprising thing was the speed increase – the equivalent algorithm in C++ is 20x faster than the F# code! For instance, running on my low-powered AMD A6-1450 powered laptop, the double barrier option took 45 seconds to calculate in F#, whereas it took 2.2 seconds in C++.
I did tweak the F# to instead of using Seq.unfold to create the asset path, it instead creates and mutates an array, and that reduced the running time by more than 25%.
EDIT: Jack Pappas has forked and modified the F# code so that its runtime is now in the same ballpark as the C++ code. The fork is here: https://gist.github.com/jack-pappas/d629a767823ca2f17967
I think there’s scope to speed up both the F# and C++, by introducing parallelism and investigating GPGPU, which I hope to do if I don’t get sidetracked.