## Reducing allocations in parsing GEDCOM files using Span

tl;dr Span<T> allows for string manipulations without having to create temporary strings. On a custom GEDCOM parser this results in far fewer allocations, but string operations on ReadOnlySpan<char> may be slower for very small strings.

## What is GEDCOM?

GEDCOM is a standard interchange file for genealogy information, and most genealogy applications and website (such as ancestry, genesreunited) support import and export to this format.

A snippet of a GEDCOM file is:

0 @I303@ INDI
1 NAME James  /Shearsby/
1 SEX M
1 BIRT
2 DATE ABT 1846
0 @I380@ INDI
1 DEAT
2 PLAC Warwickshire

The file is line based, and the first item in the line is the GEDCOM level. Every line with a level number greater than the current level item relates to that item. For example, in the snippet above the DATE is obviously associated to the parent BIRT (birth) record.

I wrote a custom GEDCOM parser nearly 15 years ago, for family tree research, and used it in a family tree application http://www.taumuon.co.uk/2013/02/viewgene.html. The parser is not complete – it won’t recognise all elements present in a GEDCOM file, and it could do with a redesign. If I was to start it from scratch today I’d probably investigate using parser combinators, but it gets the job done, but it’s still an interesting exercise to see the effect of changing to Span<T>

## What is Span<T>

Span<T> allows for a view or slice to be taken into contiguous memory, holding a pointer to the beginning of the slice and a slice length. The Span<T> is limited so that it can only live on the stack. This allows for manipulating sub-buffers or sub-strings in a memory safe way (performance improvements are possible that may have previously needed unsafe code).

Being able to take a slice of memory is useful in a parser where, for instance, a substring needs to be extracted to be passed on Int.Parse(). There are no overloads to Int.Parse that take an index and a length – so a substring needs to be obtained requiring a new allocation (There are third party libraries that do allow for generating and manipulating StringViews: https://github.com/egmkang/StringView but one downside to StringViews is that as all the views hold a reference to the original string, it’s possible to inadvertently keep the large original string alive longer than intended).

There are new string manipulation methods available in .NET core that take a ReadOnlySpan<char>, allowing a slice into a string to be manipulated (no need to allocate an intermediate string). In a GEDCOM line, the level is identified by the number before the first space, and the remainder of the string is passed on to by processed by the type-specific sub-parsers.

## Parser

The code for the parser and benchmarks can be found here: www.github.com/taumuon/GedcomParser

The parser is written in a style preferring clarity over performance, it extracts out substrings where required, then compares those substring to known tag identifiers or parses numbers using Int.Parse. To improve performance, I could have written a custom Int.Parse method, along with string equality methods, but a nice side-effect of keeping the code more idiomatic is that it does provide a simple one-to-one comparison of pre-span and span code.

The changes for Span<T> were pretty trivial, instead of extracting data from the file line-by-line (using StreamReader.ReadLine()), a char array buffer is used, and StreamReader.Read() is used to populated that buffer. New-line delimited slices of ReadOnlySpan<char> s are extracted from the buffer, meaning that no allocations are necessary for the data read from the stream.

## Performance

I generated a larger 50MB GEDCOM file, by taking royal92.GED and duplicating its contents 100 times (this file is definitely not a large file by data processing standards, but it’s on the medium to large size for a GEDCOM file, and benchmarkdotnet takes a while to perform all its runs with any larger file). I generated a larger 500MB file to test performance manually.

The span and non-span versions of the code have the same API, and live in different assemblies. The classes all have the same names but live in different namespaces. This is simply so that the benchmark project was able to easily reference both versions of the parser.

I setup a benchmark to run the span and non-span version of the parser with both a small text string, and the larger GEDCOM file.

Running on .NET Core 2.2., the results from benchmarkdotnet are:

 Method IsFile Mean Ratio Gen0/1k op Alloc Mem/op FileParserSpan False 28.60 us 0.84 7.0190 14.41 KB FileParser False 31.55 us 1.00 11.6272 23.83 KB FileParserSpan True 1,727ms 1.04 124000 255 MB FileParser True 1,665ms 1.00 369000 757 MB

The IsFile flag indicates whether parsing is done of a small 70 line string constant (IsFile=False), or a larger file on disc.

For the scenario of parsing a small 70 line GEDCOM, there is a slight but insignificant performance improvement for Span<T>. For larger files (~50MB), Span<T> actually appears to be slower. I’m not sure what the reason for the slow down is, I’d expect that reducing allocations would improve performance, unless that was counteracted by string operations on Span<T> being slower than on String.

The benchmark shows that the garbage collector wasn’t overly stressed, with no gen1 or gen2 collections. To show that using Span would be beneficial if the garbage collector had more work to do, I created a huge (500MB) GEDCOM file, which was too large to benchmark, but I created a simple console application to parse the file and found a massive 2x speed up using the Span version of the parser (98 seconds to parse vs 226 seconds). This is an amazing change just for making a few changes in the code.

Of course, most real-world files aren’t anywhere near this size, so it’s questionable whether it would be worth making the change. In a real-world application, parsing may not be the only operation happening, the parsing logic may occur in the context of a larger application which is obviously creating many other allocations, and even though it can’t be measured by a standalone benchmark, anything to reduce unnecessary work for the garbage collector may be beneficial for performance in the application. In the real world it would be necessary to measure the performance impact of making the change in a realistic measurable scenario.

Even though the garbage collector doesn’t have much work to do when parsing a small GEDCOM snippet, there will still be many fewer gen 0 collections, so I was surprised that the performance improvement wasn’t greater, and was surprised that parsing a medium size file actually seemed slow (this is especially confusing as parsing a large file in the console application is so many times quicker, there may be something strange happening with the bencharkdotnet benchmark).

I suspected that the operations on MemoryExtensions, Equals() and IndexOf may be slower than the equivalent string operations. Running the parser under a performance profiler did indicate that these are hotspots in the code. To test this out in an isolated case, I created a separate benchmark to investigate this (the BenchmarkIndexOf class in the benchmark project).

The benchmark logic compares using IndexOf directly on a string, compared to obtaining a ReadOnlySpan<char> from the string, and using the IndexOf method on MemoryExtensions. The methods that use IndexOf on String still obtain the ReadOnlySpan<char> to ensure that the only difference in the methods is how the IndexOf is performed. This is done for a small, medium and large(-ish) string:

 Method Mean IndexOfStrSmall 8.966 ns IndexOfSpanSmall 10.724 ns IndexOfStrMed 23.150 ns IndexOfSpanMed 23.224 ns IndexOfStrLarge 94.046 ns IndexOfSpanLarge 86.213 ns

For small strings, it does appear that the MemoryExtensions.IndexOf() method does have a greater overhead than String.IndexOf(), but for larger strings the performance of span is better than for string, so it appears that the operations for ReadOnlySpan<char> are optimised for larger strings compared with the operations directly on String.

Still, Span<T> has shown to have potential for fantastic performance improvements.

## Realtime soft body physics with Finite Elements

I’ve had a play with implementing soft body physics as described in the books Physics-based animation and Game Programming Pearls. (BTW I’ve no idea what’s going on with the price of Physics Based Animation on Amazon – I have a hardback copy available if anyone wants to buy it at that price :D)

The demo scene is a wall of cubes, anchored at the base of the wall. Each cube is made up of six tetrahedra. A force is applied to the top-right node causing the nodes to move and the elements to deform. The stiffness of the blocks is relatively low, and I haven’t applied any damping, so the resulting behaviour is as if the wall is made of jelly.

The description of the algorithm is described pretty similarly in both books, but there are enough subtleties it’s useful to cross-reference between the books where things are unclear. My code most closely follows Physics Based Pearls.

The code makes heavy use of the Eigen matrix library, and its conjugate gradient solver. As the Eigen library makes heavy use of proxy objects via template expressions, I did get caught out more than a couple of times with assigning the result of a matrix operation to an auto-declared variable.

The rendering is done in OpenGL ES 3.0, so that it runs on desktop graphics supporting OpenGL ES (tested on Intel Skylake), and more recent Android phones (tested on an LG G4).

I used Visual Studio’s cross-platform solution template to create the solution, so the bulk of the code is common between the desktop and mobile, with only a thin platform specific layer.

The code runs fluidly on the desktop with 192 elements and 90 nodes, but runs an order of magnitude slower on mobile.

Code is available on github: https://github.com/taumuon/Finite-Element-Soft-Body-Physics

## Particle Filter in F#

Since my last blog post, I’ve been reading the Probabilistic Robotics book. I coded up a Kalman Filter in C++ (https://github.com/taumuon/Kalman-Filter).

It’s been a while since I’d coded up any F#, so I converted a particle filter over from Python. The code is from the online MOOC Artificial Intelligence for Robotics from Udacity.

The idea is to perform robot localisation, by creating a set of particles randomly over the entire state space. On a noisy sensor reading, the probability of that sensor reading having being made for each particle is calculated. Then, all particles undergo a weighted resampling (weighted by the measurement probability).

In this example, the sensor reading is the distance to four beacons at a known position in space. The same algorithm is used in Simultaneous Location and Mapping (SLAM) applications, where the sensor readings could be sonar, radar, visual etc.

The F# code is nice and compact at only 94 lines. The project is available at https://github.com/taumuon/Particle-Filter-FSharp

## Optimal Control of Double Integrator with Dynamic Programming

I showed in my last blog post how LQR control can stabilize a cart pole around its unstable fixed point.

However, LQR control doesn’t help to swing the cart pole up into the balance point from states which are far away from the fixed point.

The underactuated robotics course discusses energy pumping techniques, to add or remove energy from the system until it reaches the homoclinic orbit (the trajectory which passes through the unstable fixed point) – and when the state is near the fixed point the fixed control scheme is switched to LQR control.

The course shows an alternate approach – using dynamic programming to algorithmically find the time-minimal optimal control from any position in the state-space. The lectures and homework shows an example of the dynamic programming algorithm on the double integrator system, and I replicated this in C# to get confidence that I had understood and implemented the algorithm correctly.

The animations show how the optimal control, and the cost function, are iterated to the converged solution, via value iteration:

One thing that took me a while to figure out, and I didn’t see straight away from the notes, is that as this is an infinite horizon problem, the gamma term is necessary in adding to the previous cost.

A sample trajectory using the analytical solution is shown here:

Using the dynamic programming approach for a state space discretized in x between -10 and 10 with 49 grid and x_dot discretised between -5 and 5 with 49 grid points gives the following:

The trajectory does reach the set point, but it is skewed from the analytical solution, and the acceleration isn’t at -1 on the approach to the target, but is somewhere between -1 and 0.

This is because of the discretisation, the grid points around the switching surface are too large, so the values are being interpolated where they should be saturated at 1 or -1. Changing the state space to be more focused on this trajectory (x between -5 and 5, and x_dot between -2.5 and 2.5) with 199 points, gives a result much closer to the analytical solution:

I haven’t yet coded this up for the cart pole problem, the core of the algorithm is the same, but with more dimensions. The only real part of the algorithm that changes is the volumetric interpolation. I don’t want to recreate Python’s ndarray in C#. I might look at a C++ implementation using Eigen’s tensor class.

The next topic to investigate is reinforcement learning (the solution in Drake uses a Markov Decision Process).

Resources:

## LQR Control of Inverted Pendulum in C++ using Eigen

This blog has been quite for ages as I’ve been spending most of my commute time participating in MOOCS. One course that I found especially interesting, but have not yet found the time to complete, is underactuated robotics. I’m slowly exploring some of the ideas in my own time, and that’s what this blog post is about.

The course explores exploiting the non-linear dynamics of a system, to create more efficient, elegant control schemes. There are lots of interesting videos on youtube of acrobots, wheeled walkers, double inverted pendulums etc.

The first week investigates the non-linear dynamics of the pendulum, and I investigated that using iPython Notebooks here (or in nbviewer).

Python notebooks provide a rich documentation environment; it’s easy to generate charts and animations, but I found the actual writing and debugging of python code in a notebook to be quite painful, so have switched to C++ for the cart pole example.

This blog post will simply cover the LQR control of an inverted pendulum at its unstable fixed point (similar to balancing a pencil on a finger). I’ll hopefully get around to investigating the non-linear swing-up problem in the near future.

I’ve simply copied the below equations out of the underactuated notes for ease of reference (it’s strongly recommended to read the notes).

The equations of motion for a cart pole are:

$$(m_c + m_p) \ddot{x} + m_p l \ddot{\theta}cos {\theta} – m_pl \dot{\theta}^2 sin {\theta} = f$$

$$m_p l \ddot{x} cos {\theta} + m_p l^2 \ddot{\theta} + m_p g l sin {\theta} = 0$$

Directly solving them for $\ddot{x}$ and $\ddot{\theta}$ gives:

$$\ddot{x} = \frac{1}{m_c + m_p sin^2 {\theta}} [f + m_p sin{\theta}(l \dot{\theta}^2 + g cos{\theta})]$$

$$\ddot{\theta} = \frac{1}{l(m_c + m_p sin^2 \theta)} [-f cos \theta -m_pl\dot{\theta}^2 cos{\theta}sin{\theta}-(m_c+m_p)g sin{\theta}]$$

I haven’t pulled out the linearised equations, or the derivation of the LQR control, as they are described clearly on the umich website (see the links at the end of this post).

For my implementation code, I solve the LQR K matrix offline using GNU Octave’s Control package. I may at some point investigate if there are any resources on solving the algebraic Riccati equation online, or otherwise calling Octave from C++.

For the simulation, I investigated the cart pole with an initial pole angular displacement of 0.1 radians from the unstable fixed point.

The LQR control applied to the linear system gives the following. The control does well with fast settling time and minimal overshoot.

The LQR control applied to the non-linear system does stabilise the system, but it is more oscillatory. Later lectures of the course do discuss finding where in the state-space the LQR control is applicable, and I may get around to investigating these ideas at some point.

The rest of the blog post will discuss the implementation.

The code simply simulates the system from the initial conditions, finding the desired control input at each step given the current system state. It writes out the resulting values of x and ${\theta}$ to a space-delimited text file.

The plotting is done externally using GNUPlot. Having to round-trip out to an external program is annoying, and I may investigate charting libraries in Qt at some point.

I’m using the Eigen matrix library, which is a header-only library. It’s simple to use, and uses the processor’s SIMD instruction set for performance.

I did originally run into issues with Eigen due to alignment. For SIMD to work, the memory allocations are aligned on a 16-byte boundary. However, the Visual C++ compiler doesn’t maintain alignment on passing parameters into functions by value, and the Eigen documentation recommends passing by const ref.

Passing matrices by const ref into methods is fine, but the issue I hit is that I have a function that returns a lambda, which time steps the system given the current system state, the time delta, and the control input. This allows the linear and non-linear systems to be abstracted away from the generation of the overall output trajectory.

That function can’t generate the matrices on each call, as they’re quite expensive to calculate – it wants to capture the matrices by value, as closing over by reference would obviously reference objects which are destroyed by the time the lambda is called. However, trying to capture by value runs into the same alignment errors.

I tried to pass the CartPoleLinear by value, but it warned about alignment issues, even though it used Eigen’s recommended EIGEN_MAKE_ALIGNED_OPERATOR_NEW define. My first solution was to store each of the matrices in a shared_ptr, to ensure that the lambda kept them alive after the factory function returns. This was nasty – to store Eigen matrices in a shared_ptr requires that a custom allocator is used.

auto GetTimeStepFunctionCartPoleLinear(SystemParams system_params)
{
// This is nasty, but Eigen has alignment issues if capturing by value

CartPoleLinear cartPoleLinear(system_params);
Eigen::aligned_allocator<Eigen::Matrix4d> a_allocator;
Eigen::aligned_allocator<Eigen::Matrix<double, 4, 1>> b_allocator;
Eigen::aligned_allocator<Eigen::Matrix<double, 2, 4>> c_allocator;
Eigen::aligned_allocator<Eigen::Matrix<double, 2, 1>> d_allocator;

auto pA = std::allocate_shared<Eigen::Matrix4d>(a_allocator, cartPoleLinear.AMatrix());
auto pB = std::allocate_shared<Eigen::Matrix<double, 4, 1>>(b_allocator, cartPoleLinear.BMatrix());
auto pC = std::allocate_shared<Eigen::Matrix<double, 2, 4>>(c_allocator, cartPoleLinear.CMatrix());
auto pD = std::allocate_shared<Eigen::Matrix<double, 2, 1>>(d_allocator, cartPoleLinear.DMatrix());

return [=](auto state, auto delta_time, auto u)
{
auto A = *pA;
auto B = *pB;
auto C = *pC;
auto D = *pD;

My second solution was to instead of capturing shared_ptrs, instead capture std::arrays which don’t have the alignment problem. The code is still messy having to marshall the data to and from matrices in arrays (and having to create new matrices every time that the lambda is called).

auto GetTimeStepFunctionCartPoleLinear(SystemParams system_params)
{
// This is nasty, but Eigen has alignment issues if capturing by value

std::array<double, 16> a_array;
std::array<double, 4> b_array;
std::array<double, 8> c_array;
std::array<double, 2> d_array;

CartPoleLinear cart_pole_linear(system_params);

Eigen::Map<Eigen::Matrix<double, 4, 4>>(a_array.data(), 4, 4) = cart_pole_linear.AMatrix();
Eigen::Map<Eigen::Matrix<double, 4, 1>>(b_array.data(), 4, 1) = cart_pole_linear.BMatrix();
Eigen::Map<Eigen::Matrix<double, 2, 4>>(c_array.data(), 2, 4) = cart_pole_linear.CMatrix();
Eigen::Map<Eigen::Matrix<double, 2, 1>>(d_array.data(), 2, 1) = cart_pole_linear.DMatrix();

return [=](auto state, auto delta_time, auto u)
{
Eigen::Matrix<double, 4, 4 >> A(a_array.data());
Eigen::Matrix<double, 4, 1 >> B(b_array.data());
Eigen::Matrix<double, 2, 4 >> C(c_array.data());
Eigen::Matrix<double, 2, 1 >> C(d_array.data());

The eventual solution was when I realised that the alignment issues are only in 32 bit builds. In 64 bit builds, Visual C++ aligns the parameters on 16 byte boundaries, so the sane code works. I’ve removed the 32 bit builds from the solution, but if I cared about 32 bit builds I’d disable the Eigen alignment instead of the nasty workaround above. The code is now simply:

auto GetTimeStepFunctionCartPoleLinear(SystemParams system_params)
{
CartPoleLinear cart_pole_linear(system_params);

auto A = cart_pole_linear.AMatrix();
auto B = cart_pole_linear.BMatrix();
auto C = cart_pole_linear.CMatrix();
auto D = cart_pole_linear.DMatrix();

return [=](auto state, auto delta_time, auto u)
{
Eigen::Vector4d state_vector(state.data());

Eigen::Matrix<double, 4, 1> state_dot = (A * state_vector) - (B * u);

auto new_state = state_vector + (state_dot * delta_time);
return State{ new_state[0], new_state[1], new_state[2], new_state[3] };
};
}

The GNUPlot commands for plotting are:

set terminal png size 400,300 enhanced font "arial,20"
set output "linear.png"
plot 'nonlinear.txt' using 1:2 title 'x' with lines, 'nonlinear.txt' using 1:3 title 'theta' with lines

I intend to investigate the swing up problem next. Another feature I miss from iPython Notebooks is JSAnimation, where it’s quite easy to produce a simple animation of the system. To be fair, it’s not that easy to produce a more complex animation. I’m planning to look at drawing using Cairo and then produce an animation from individual frames.

The code is available on github.

Resources:

## .NET SIMD to solve Burgers’ equation

I’m enrolled on the mooc Practical Numerical Methods in Python, and module 2 has covered the Burgers’ Equation. This is a great course by the way, and it’s not too late to join up!

I am impressed with the productivity of Python. ipython notebooks, SymPy, plotting etc. are all great and I feel that .NET could benefit from a similar ecosystem.

That said, I’ve been itching to play with the new .NET SIMD support and this seemed to be a suitable example.

The Python notebook demonstrated how using Python’s array notation to solve the Burgers’ equation using finite difference resulted in a 9 fold speed increase, on  my machine taking 25ms instead of 200ms.

I translated the code into C#, and it completed in 2ms, so I immediately was impressed (not surprised though – an interpreted versus (JIT) compiled language…). This wasn’t going to tell me much about SSE improvements though, so I increased the number of spatial samples a hundred fold, and the number of timesteps by 10.

The time was then increased to 2580ms.

Installing .NET 4.5.2 and using RyuJIT CTP4 to compile and run, the running time reduced to 1880ms. RyuJIT can’t arrive fast enough!

I then implemented array swappping, instead of continually creating and copying arrays, to see if it would reduce pressure on the GC (Calculate2 in the gist), but it had no significant difference.

I then pulled the calculation of (nu * dt / dx) into a variable (Calculation3())

This resulted in a calculation time of 230ms.

Then, I felt it was time to implement SIMD. The documentation is a bit scarce, but it wasn’t too tricky to figure out. My machine has 128bit SSE registers, so it can add two doubles (64 bit) in a single operation.

Running this resulted in a calculation time of 90ms, so slightly greater than 2x speedup. I’m not sure whether it was able to make further optimisations based on the changed structure of the code, but this is a great win. This is implemented in Calculation4() in the gist.

The gist is available here: https://gist.github.com/taumuon/5d4db567c57f9846971d

## Monte Carlo double barrier option pricing on GPU using C++ AMP

I converted the double barrier option pricing code from my last post to run on the GPU, using C++ AMP, and got an 80% speedup.

This is running on a low powered Acer V5-122p laptop with an AMD A6-1450 processor with an integrated Radeon HD 8250 GPU.

The gist is here: https://gist.github.com/taumuon/bbeeb9e2c1f5082a2699

To be fair, I converted the CPU code to be otherwise identical to the gpu code. Instead of populating an array of the sample path, it instead for each point just determines whether the value breaches the upper or lower barriers and uses the last value for the payoff.

This reduced the runtime from the code in my last blog post, of 2540ms, to 1250ms, i.e. a 2x speedup.

The GPU code was ran twice, the first time it ran in 140ms, the second run (after all shaders already compiled etc) was 15.6ms, i.e. a very impressive 80x speedup from the CPU code.

If anything, it shows that AMDs strategy of cheap low powered laptop chips will payoff if people start taking advantage of the relatively strong GPU.

## Monte Carlo pricing of Exotic Options in Functional Style C++

My last blog post was about pricing of exotic options using F#. In that post I said “Apart from the Monte Carlo code, the fact that in F# all functions are in the curried form means that composing the payoff functions using partial function application is beautiful. This would be lambda hell to replicate in C#.”

The OO version of this would probably be built via composition: having a class implementing a IPayoff interface (implemented for Call and Put), and that would be used in implementations of IPathPayof, which would be implemented by AsianArithmeticMean, DoubleBarrier etc. The use of OO is to pull out the common implementation, but there’s no state associated with any of these classes – it feels much nicer using function composition.

I’ve implemented a version in C++ using lambdas, and the functional composition is actually really nice. C++ has the std::bind method which means that any number of parameters can be partially applied even for functions not in the curried form.

The code can be found here: https://gist.github.com/taumuon/727c31f4a56518f91718

The most surprising thing was the speed increase – the equivalent algorithm in C++ is 20x faster than the F# code! For instance, running on my low-powered AMD A6-1450 powered laptop, the double barrier option took 45 seconds to calculate in F#, whereas it took 2.2 seconds in C++.

I did tweak the F# to instead of using Seq.unfold to create the asset path, it instead creates and mutates an array, and that reduced the running time by more than 25%.

EDIT: Jack Pappas has forked and modified the F# code so that its runtime is now in the same ballpark as the C++ code. The fork is here: https://gist.github.com/jack-pappas/d629a767823ca2f17967

I think there’s scope to speed up both the F# and C++, by introducing parallelism and investigating GPGPU, which I hope to do if I don’t get sidetracked.

## Monte Carlo pricing of Exotic Options in F#

I’ve recently completed all exercises for the MOOC Monte Carlo methods in Finance, ran by the University of Madrid on Iversity. The course was excellent, both in its content and delivery.

The course started off by refreshing probability theory (cdf, pdf etc) and quickly moved onto generating random numbers, Brownian motion and stochastic differential equations followed by the pricing of options and calculating risk.
The final week’s homework is on the Monte Carlo calculation of a portfolio’s risk, using copulas with Student’s t distribution, but as that week’s homework is still running I’m blogging about the penultimate week’s homework.

This homework is about pricing path dependent options, and demonstrating how using a control variate can provide a better result for the same number of iterations (better meaning less error, i.e. with a smaller standard deviation). The homework further shows that when pricing a barrier
option, using an up-and-out option as the control variate produces better results than using a European option as the control variate.

The course delivery was extremely slick. The videos were filmed specially, this wasn’t a rehashing of a lecture in front of students (as seems to happen in some other online courses). The lectures were short, and to the point, managing to cram a lot of information in with zero fluff. I really appreciated this as I was watching on a daily commute.

The assessments were by multiple choice question, but with only one attempt, so it did end up actually feeling like a challenge. The questions were a mixture of maths and coding, with the balance moving more towards coding as the weeks went on.
The code was delivered in Matlab, which I had to get reacquainted to after a decade break. The main downside I found is that I ended up doing a fair bit of copy-paste coding, so completed some of the homeworks in .NET instead. I personally found the coding easy, and found the maths more challenging (in a good way).

Now, for the code – the F# code is available here: https://gist.github.com/taumuon/11302896

Apart from the Monte Carlo code, the fact that in F# all functions are in the curried form means that composing the payoff functions using partial function application is beautiful. This would be lambda hell to replicate in C#.

If this course runs again I’d wholeheartedly recommend it.

## MOOCs

I’ve just enrolled on Iversity’s Monte Carlo Methods in Finance course, and have converted some of week 1’s Matlab demo code over to F# and Deedle: https://gist.github.com/taumuon/8602365

I spent the latter half of last year diving into various online courses. I completed Coursera’s Mathematical Methods for Quantitative Finance, and also followed along with a number of other courses to varying levels of completeness.

I completed three weeks assignments of Udacity’s Cuda programming course.  Week three was painful due to an error in the reference code, and week 4 was crashing due to a memory exception. I was using a GPU emulator in Ubuntu, and decided that it would be easier with real hardware. I watched the remaining videos and found the parallel algorithms explanations useful.

I completed the first two weeks of Coursera’s Scientific Computing. These were maths exercises, which I enjoyed and that’s what inspired me to do the Maths Methods for Quant Finance course. The Matlab exercises I was planning to do in F#, but left the course when other attendees were complaining that to complete the homework to the correct numerical accuracy you needed the exact version of Matlab the instructor was using, and they were unable to use Gnu Octave.

It is great that there’s so much free high standard material available. The fixed timescale nature of the course is a bit annoying – if work or life gets in the way one week it may make it impossible to catch up with the remainder of the course. I may get around to trying a course again next time it comes around though.