## Realtime soft body physics with Finite Elements

I’ve had a play with implementing soft body physics as described in the books Physics-based animation and Game Programming Pearls. (BTW I’ve no idea what’s going on with the price of Physics Based Animation on Amazon – I have a hardback copy available if anyone wants to buy it at that price :D)

The demo scene is a wall of cubes, anchored at the base of the wall. Each cube is made up of six tetrahedra. A force is applied to the top-right node causing the nodes to move and the elements to deform. The stiffness of the blocks is relatively low, and I haven’t applied any damping, so the resulting behaviour is as if the wall is made of jelly.

The description of the algorithm is described pretty similarly in both books, but there are enough subtleties it’s useful to cross-reference between the books where things are unclear. My code most closely follows Physics Based Pearls.

The code makes heavy use of the Eigen matrix library, and its conjugate gradient solver. As the Eigen library makes heavy use of proxy objects via template expressions, I did get caught out more than a couple of times with assigning the result of a matrix operation to an auto-declared variable.

The rendering is done in OpenGL ES 3.0, so that it runs on desktop graphics supporting OpenGL ES (tested on Intel Skylake), and more recent Android phones (tested on an LG G4).

I used Visual Studio’s cross-platform solution template to create the solution, so the bulk of the code is common between the desktop and mobile, with only a thin platform specific layer.

The code runs fluidly on the desktop with 192 elements and 90 nodes, but runs an order of magnitude slower on mobile.

Code is available on github: https://github.com/taumuon/Finite-Element-Soft-Body-Physics

## LQR Control of Inverted Pendulum in C++ using Eigen

This blog has been quite for ages as I’ve been spending most of my commute time participating in MOOCS. One course that I found especially interesting, but have not yet found the time to complete, is underactuated robotics. I’m slowly exploring some of the ideas in my own time, and that’s what this blog post is about.

The course explores exploiting the non-linear dynamics of a system, to create more efficient, elegant control schemes. There are lots of interesting videos on youtube of acrobots, wheeled walkers, double inverted pendulums etc.

The first week investigates the non-linear dynamics of the pendulum, and I investigated that using iPython Notebooks here (or in nbviewer).

Python notebooks provide a rich documentation environment; it’s easy to generate charts and animations, but I found the actual writing and debugging of python code in a notebook to be quite painful, so have switched to C++ for the cart pole example.

This blog post will simply cover the LQR control of an inverted pendulum at its unstable fixed point (similar to balancing a pencil on a finger). I’ll hopefully get around to investigating the non-linear swing-up problem in the near future.

I’ve simply copied the below equations out of the underactuated notes for ease of reference (it’s strongly recommended to read the notes).

The equations of motion for a cart pole are:

$$(m_c + m_p) \ddot{x} + m_p l \ddot{\theta}cos {\theta} – m_pl \dot{\theta}^2 sin {\theta} = f$$

$$m_p l \ddot{x} cos {\theta} + m_p l^2 \ddot{\theta} + m_p g l sin {\theta} = 0$$

Directly solving them for $\ddot{x}$ and $\ddot{\theta}$ gives:

$$\ddot{x} = \frac{1}{m_c + m_p sin^2 {\theta}} [f + m_p sin{\theta}(l \dot{\theta}^2 + g cos{\theta})]$$

$$\ddot{\theta} = \frac{1}{l(m_c + m_p sin^2 \theta)} [-f cos \theta -m_pl\dot{\theta}^2 cos{\theta}sin{\theta}-(m_c+m_p)g sin{\theta}]$$

I haven’t pulled out the linearised equations, or the derivation of the LQR control, as they are described clearly on the umich website (see the links at the end of this post).

For my implementation code, I solve the LQR K matrix offline using GNU Octave’s Control package. I may at some point investigate if there are any resources on solving the algebraic Riccati equation online, or otherwise calling Octave from C++.

For the simulation, I investigated the cart pole with an initial pole angular displacement of 0.1 radians from the unstable fixed point.

The LQR control applied to the linear system gives the following. The control does well with fast settling time and minimal overshoot.

The LQR control applied to the non-linear system does stabilise the system, but it is more oscillatory. Later lectures of the course do discuss finding where in the state-space the LQR control is applicable, and I may get around to investigating these ideas at some point.

The rest of the blog post will discuss the implementation.

The code simply simulates the system from the initial conditions, finding the desired control input at each step given the current system state. It writes out the resulting values of x and ${\theta}$ to a space-delimited text file.

The plotting is done externally using GNUPlot. Having to round-trip out to an external program is annoying, and I may investigate charting libraries in Qt at some point.

I’m using the Eigen matrix library, which is a header-only library. It’s simple to use, and uses the processor’s SIMD instruction set for performance.

I did originally run into issues with Eigen due to alignment. For SIMD to work, the memory allocations are aligned on a 16-byte boundary. However, the Visual C++ compiler doesn’t maintain alignment on passing parameters into functions by value, and the Eigen documentation recommends passing by const ref.

Passing matrices by const ref into methods is fine, but the issue I hit is that I have a function that returns a lambda, which time steps the system given the current system state, the time delta, and the control input. This allows the linear and non-linear systems to be abstracted away from the generation of the overall output trajectory.

That function can’t generate the matrices on each call, as they’re quite expensive to calculate – it wants to capture the matrices by value, as closing over by reference would obviously reference objects which are destroyed by the time the lambda is called. However, trying to capture by value runs into the same alignment errors.

I tried to pass the CartPoleLinear by value, but it warned about alignment issues, even though it used Eigen’s recommended EIGEN_MAKE_ALIGNED_OPERATOR_NEW define. My first solution was to store each of the matrices in a shared_ptr, to ensure that the lambda kept them alive after the factory function returns. This was nasty – to store Eigen matrices in a shared_ptr requires that a custom allocator is used.

auto GetTimeStepFunctionCartPoleLinear(SystemParams system_params)
{
// This is nasty, but Eigen has alignment issues if capturing by value

CartPoleLinear cartPoleLinear(system_params);
Eigen::aligned_allocator<Eigen::Matrix4d> a_allocator;
Eigen::aligned_allocator<Eigen::Matrix<double, 4, 1>> b_allocator;
Eigen::aligned_allocator<Eigen::Matrix<double, 2, 4>> c_allocator;
Eigen::aligned_allocator<Eigen::Matrix<double, 2, 1>> d_allocator;

auto pA = std::allocate_shared<Eigen::Matrix4d>(a_allocator, cartPoleLinear.AMatrix());
auto pB = std::allocate_shared<Eigen::Matrix<double, 4, 1>>(b_allocator, cartPoleLinear.BMatrix());
auto pC = std::allocate_shared<Eigen::Matrix<double, 2, 4>>(c_allocator, cartPoleLinear.CMatrix());
auto pD = std::allocate_shared<Eigen::Matrix<double, 2, 1>>(d_allocator, cartPoleLinear.DMatrix());

return [=](auto state, auto delta_time, auto u)
{
auto A = *pA;
auto B = *pB;
auto C = *pC;
auto D = *pD;

My second solution was to instead of capturing shared_ptrs, instead capture std::arrays which don’t have the alignment problem. The code is still messy having to marshall the data to and from matrices in arrays (and having to create new matrices every time that the lambda is called).

auto GetTimeStepFunctionCartPoleLinear(SystemParams system_params)
{
// This is nasty, but Eigen has alignment issues if capturing by value

std::array<double, 16> a_array;
std::array<double, 4> b_array;
std::array<double, 8> c_array;
std::array<double, 2> d_array;

CartPoleLinear cart_pole_linear(system_params);

Eigen::Map<Eigen::Matrix<double, 4, 4>>(a_array.data(), 4, 4) = cart_pole_linear.AMatrix();
Eigen::Map<Eigen::Matrix<double, 4, 1>>(b_array.data(), 4, 1) = cart_pole_linear.BMatrix();
Eigen::Map<Eigen::Matrix<double, 2, 4>>(c_array.data(), 2, 4) = cart_pole_linear.CMatrix();
Eigen::Map<Eigen::Matrix<double, 2, 1>>(d_array.data(), 2, 1) = cart_pole_linear.DMatrix();

return [=](auto state, auto delta_time, auto u)
{
Eigen::Matrix<double, 4, 4 >> A(a_array.data());
Eigen::Matrix<double, 4, 1 >> B(b_array.data());
Eigen::Matrix<double, 2, 4 >> C(c_array.data());
Eigen::Matrix<double, 2, 1 >> C(d_array.data());

The eventual solution was when I realised that the alignment issues are only in 32 bit builds. In 64 bit builds, Visual C++ aligns the parameters on 16 byte boundaries, so the sane code works. I’ve removed the 32 bit builds from the solution, but if I cared about 32 bit builds I’d disable the Eigen alignment instead of the nasty workaround above. The code is now simply:

auto GetTimeStepFunctionCartPoleLinear(SystemParams system_params)
{
CartPoleLinear cart_pole_linear(system_params);

auto A = cart_pole_linear.AMatrix();
auto B = cart_pole_linear.BMatrix();
auto C = cart_pole_linear.CMatrix();
auto D = cart_pole_linear.DMatrix();

return [=](auto state, auto delta_time, auto u)
{
Eigen::Vector4d state_vector(state.data());

Eigen::Matrix<double, 4, 1> state_dot = (A * state_vector) - (B * u);

auto new_state = state_vector + (state_dot * delta_time);
return State{ new_state[0], new_state[1], new_state[2], new_state[3] };
};
}

The GNUPlot commands for plotting are:

set terminal png size 400,300 enhanced font "arial,20"
set output "linear.png"
plot 'nonlinear.txt' using 1:2 title 'x' with lines, 'nonlinear.txt' using 1:3 title 'theta' with lines

I intend to investigate the swing up problem next. Another feature I miss from iPython Notebooks is JSAnimation, where it’s quite easy to produce a simple animation of the system. To be fair, it’s not that easy to produce a more complex animation. I’m planning to look at drawing using Cairo and then produce an animation from individual frames.

The code is available on github.

Resources:

## Monte Carlo double barrier option pricing on GPU using C++ AMP

I converted the double barrier option pricing code from my last post to run on the GPU, using C++ AMP, and got an 80% speedup.

This is running on a low powered Acer V5-122p laptop with an AMD A6-1450 processor with an integrated Radeon HD 8250 GPU.

The gist is here: https://gist.github.com/taumuon/bbeeb9e2c1f5082a2699

To be fair, I converted the CPU code to be otherwise identical to the gpu code. Instead of populating an array of the sample path, it instead for each point just determines whether the value breaches the upper or lower barriers and uses the last value for the payoff.

This reduced the runtime from the code in my last blog post, of 2540ms, to 1250ms, i.e. a 2x speedup.

The GPU code was ran twice, the first time it ran in 140ms, the second run (after all shaders already compiled etc) was 15.6ms, i.e. a very impressive 80x speedup from the CPU code.

If anything, it shows that AMDs strategy of cheap low powered laptop chips will payoff if people start taking advantage of the relatively strong GPU.

## Monte Carlo pricing of Exotic Options in Functional Style C++

My last blog post was about pricing of exotic options using F#. In that post I said “Apart from the Monte Carlo code, the fact that in F# all functions are in the curried form means that composing the payoff functions using partial function application is beautiful. This would be lambda hell to replicate in C#.”

The OO version of this would probably be built via composition: having a class implementing a IPayoff interface (implemented for Call and Put), and that would be used in implementations of IPathPayof, which would be implemented by AsianArithmeticMean, DoubleBarrier etc. The use of OO is to pull out the common implementation, but there’s no state associated with any of these classes – it feels much nicer using function composition.

I’ve implemented a version in C++ using lambdas, and the functional composition is actually really nice. C++ has the std::bind method which means that any number of parameters can be partially applied even for functions not in the curried form.

The code can be found here: https://gist.github.com/taumuon/727c31f4a56518f91718

The most surprising thing was the speed increase – the equivalent algorithm in C++ is 20x faster than the F# code! For instance, running on my low-powered AMD A6-1450 powered laptop, the double barrier option took 45 seconds to calculate in F#, whereas it took 2.2 seconds in C++.

I did tweak the F# to instead of using Seq.unfold to create the asset path, it instead creates and mutates an array, and that reduced the running time by more than 25%.

EDIT: Jack Pappas has forked and modified the F# code so that its runtime is now in the same ballpark as the C++ code. The fork is here: https://gist.github.com/jack-pappas/d629a767823ca2f17967

I think there’s scope to speed up both the F# and C++, by introducing parallelism and investigating GPGPU, which I hope to do if I don’t get sidetracked.

## Picking with DirectX 11 and Bullet Physics

I updated the falling cubes demo to separate out the rendering and physics into separate WinRT components – see the WinRT subfolder under http://github.com/taumuon/Taumuon.Game . The performance of this wasn’t noticeably different when consumed by a WP8 DirectX application, but I then switched over to a “DirectX with XAML application”, so that the application was hosted via C#, and consumed the WinRT components via a DrawingSurfaceBackgroundGrid, and performance fell off a cliff.

I created the C# host as I intended to put as much game logic as I could into an F# portable library, and the only way to do this is to have a C# hosting application assembly as WP8 apps, unlike Windows Store apps, don’t allow WinRT(P) components to be created in anything other than C++.

Instead, of fighting the performance issues, I decided that it would be better to implement the game logic purely in C++ – see the WinRT subfolder under http://github.com/taumuon/Taumuon.Game . This replaces most uses of the C++/CX extensions with standard C++ types, and implements picking, loosely based off http://halogenica.net/ray-casting-and-picking-using-bullet-physics

It gives a childish sense of satisfaction to do something as simple as destroy a wall of bricks so that the wall collapses.

Once I get time, the next step is to make the wall collapse more realistically by adding spring constraints between adjacent blocks, and to remove those springs once they are stretched beyond some elastic limit.

## Bullet Physics demo DirectX C++ app on Windows Phone 8

I’ve finally gotten around to updating my original Windows Store Bullet Physics demo (http://www.taumuon.co.uk/2012/04/bullet-physics-and-directx-11-on-windows-8-metro.html) to build for the Windows 8 final release.

While I was at it, I also got it working for Windows Phone 8 – code is on github https://github.com/taumuon/WP8-Bullet. There weren’t many changes. In building Bullet for ARM, I got around lots of parameter alignment issues by defining BT_USE_DOUBLE_PRECISION. This was the quickest thing to do, and may have performance implications but the performance running on my Nokia Lumia 920 seems encouraging. It wouldn’t have been a much bigger change to instead see if ARM is defined.

## State Monad in C# and F#

My colleagues and I recently have been trying to gain a deeper understanding of monads, other than using LINQ and RX. Watching all the channel9 videos on monads, the concept on the surface seemed quite simple, but we were felt that we were somehow missing some key point. After reading various blog posts, the penny is finally dropping.

Some post I read somewhere (I can’t remember where) said that it’s easier to understand the individual monads before trying to understand the whole generalised concept. Mike Hadlow’s blog series (http://mikehadlow.blogspot.co.uk/2011/01/monads-in-c1-introduction.html) really explain well the identity and maybe monads (which we already understood, which is just as well as we’re using the maybe monad in our codebase). Unfortunately he didn’t explain the state monad.

The only information we could find on the state monad in C# was Brian Beckman’s Channel 9 video: http://channel9.msdn.com/Shows/Going+Deep/Brian-Beckman-The-Zen-of-Expressing-State-The-State-Monad

We found Brian Beckman’s example overly complex, there’s quite a lot going on in that video (and the code needed de-obfuscating to read more like idiomatic C#), but it’s probably a good second or third example once the basics are understood. I wanted to look at a simpler example, and found this article: http://ertes.de/articles/monads.html#section-6 the explanation of the get and put methods show pretty clearly how the state monad works, and you should probably read and understand that section before carrying on here, as I’m going to jump straight on to show how the random number generator example looks in F# and C#.

### C# implementation

First off, here’s an OO random number generator:

It is easy to add three random numbers together using an instance of the generator.

We can make NextShort purely functional, so that calling it has no side-effects, by passing the current state in, and returning the new state back out, along with the result:

To add three numbers, the state has to be kept around, and passed to each of the calls in turn. This obfuscates the main business logic.

Jumping straight to the punch line, using a state monad means that the boilerplate state management can be hidden away so that the algorithm reads as clearly as it did before:

This is a trivial example, but hopefully it makes the point. The rest of this blog post will show the implementation, along with the same in F#.

First off, we have a state class, which holds a Func that given a state, returns a tuple of the state and value. The F# example doesn’t bother with the State class, and I could have made the C# version simply use the Func everywhere, but it seemed clearer to me to have the wrapper object just to get a bit of context.

I created a non-generic State class as somewhere convenient to put the Get and Set methods:

The meat of the monad lives in extension methods in a StateEx class:

and SelectMany is implemented in terms of Bind the same as in Mike Hadlow’s example for the Identity and Maybe monads:

The computation to compose can be expressed in the GetRandom method:

This returns an operation (a Func in the State container) that invokes NextShort with the old state, and returns a computation with the new state and value.

The resulting composed operation using LINQ is described above, but this is what it looks like desugared:

### F# implementation

The code for the random number generator and composed operation look similar to the C#:

This has the same problem of explicitly passing the state around. I can ruin the surprise again and show how using monads hides the explicit state passing:

This code makes use of F#’s support for monads via Computation Expressions. The code for the state builder is from here: http://www.navision-blog.de/2009/10/23/using-monads-in-fsharp-part-i-the-state-monad/

The Haskell example also had the >> operator, but of course F# uses that as the forward composition operator J

getRandom looks pretty much identical to the Haskell example:

The composed version using the StateBuilder is shown above, but here’s the desugared version:

The F# version is much clearer to read than the C#, in part due to the type inference meaning that the code has less pointless fluff. Having functions live in the module and not as part of a class aids that too (can simply call setState instead of State.Set in the C#)

Following through the simpler example really helped my understanding, and hopefully will see examples in the future where it will be sensible to use the state monad in my coding.

## Monte Carlo C++ AMP

This blog is starting to look very inconsistent – my last blog post was talking about starting to write a game, and now I’ve gone onto a totally different topic. Due to time the game’s gone onto, if not the back burner, then a gentle simmer. I’ve got to balance doing cool stuff with keeping on top of various technologies that may be relevant as a contractor, and having a life!

This blog will describe using C++ AMP for calculating Pi; the calculation of Pi is a fairly good example of Monte Carlo calculations, as the algorithm’s simple, and we all know the result.

First off, this is what the algorithm looks like in C#:

It simply finds the number of circles representing x-y coordinate pairs, whose vector magnitude falls within the unit circle. This ratio is Pi / 4.

The random class is the Mersenne Twister code from here: http://takel.jp/mt/MersenneTwister.cs. The results are:

Replacing the Mersenne Twister class with System.Random resulted in code which was approximately 30% slower. I’m running this on a dual-core laptop, but have not parallelised it, as I didn’t fancy porting over a Parallel Random Number Generator (PRNG) myself.

Tomas Petricek has an example of using the GPU to calculate Pi using F# here, but his example generates the random numbers on the CPU and uploads them to the GPU to do the calculations and reduction.

### C++ AMP Version

Microsoft have just released a C++ AMP Random Number generator library to codeplex http://amprng.codeplex.com/, but I’m using BharathM port of the parallel Mersenne Twister described here: http://blogs.msdn.com/b/nativeconcurrency/archive/2011/12/20/mersenne-twister-sample-using-c-amp.aspx

His example generates random numbers into a rank 2 array of size 4096 * 2, but I’ve modified g_n_per_RNG to be 256.

The first thing I do is to pair up those random numbers into x-y coordinates and to find the square of their magnitude:

Then I determine whether the magnitude of these coordinates fall outside of the unit circle:

The final step is to do a parallel reduce of the resulting array. This is using the just-released reduction method from the C++ AMP algorithms library: http://ampalgorithms.codeplex.com/

The reduce function behaves similar to STL’s std::accumulate in that you can pass in the binary operation to perform on each item. The function takes a rank 1 array, so I’m using view_as to change rank.

To perform the timings I run the algorithm twice, once to warm up (ensure all shaders etc are compiled – see http://social.msdn.microsoft.com/Forums/en-US/parallelcppnative/thread/1c6c9f04-1f9f-4f44-99f3-154d991ae5ba )

The results I obtained are:

Calculating the result for half a million random numbers took approximately 16 milliseconds, whereas for the single threaded C# code a million iterations took 44 ms, so accounting for using both cores the performance seems pretty similar. This is quite disappointing, even though the GPU isn’t the most powerful (a mobile 5470), the CPU isn’t the most powerful either, so I was hoping for near an order of magnitude speed up. I wasn’t doing anything clever with tiling, and there may be other bits of the API I’m not using correctly. I’ll have to get hold of Kate Gregory’s book and try with different examples.

## Bullet Physics and DirectX 11 on Windows 8 Metro

I’ve got a game idea which I think may work on a tablet so I’ve created the following ‘demo’. I’ve hooked together the Bullet Physics engine with some 3D graphics, and the falling wall of cubes could pretty much be the canonical “Hello, World” equivalent for a 3D game. I’m not sure how much time I’ll have to pursue my game idea, but it should in the worst case be a route to learning some new technologies.

The demo won’t win any awards for its graphics, and the video is pretty poor quality (I recorded the screen with my camera instead of using any screen capturing software).

Falling cubes

Microsoft seem to be definitely pushing for the use of C++ for game creation with Windows 8, with no announcement of a new version of XNA, but that seems to make sense; C++ is closer to the metal than C# and will get more performance out of lower power ARM tablets. Also, there’s quite a buzz with the C++ renaissance with C++11, and modern-style C++. The corresponding new technologies in VS 2010 (PPL) and VS 11 (auto-vectorization, C++ AMP) are especially cool.

I’m also keen to get back into C++ – the past year or so I’ve been working purely in C#, but all my prior experience was using a mixture of C# and C++, and I do miss C++ (even though I am more productive in C#). Also, my previous roles were generally in a modern style (use of STL, smart pointers etc.) but my most recent experience was “C with Classes” style code, so I’ve been relearning modern style as well as picking up the new C++11 features.

Building Bullet as a static lib was pretty easy, I just had to switch it over to use the Multi-threaded DLL c-runtime. Defining WINAPI_FAMILY=WINAPI_PARTITION_APP to detect any non-metro API usages revealed the usage of GetTickCount, which is unavailable on Metro apps, so I just turned off profiling by defining BT_NO_PROFILE 1 in btQuickprof.h

The Windows 8 samples including a couple of games which are decent guides on using DirectX 11. These demos aren’t supposed to be examples of a whole game engine, and don’t necessarily follow best practices: Marble Maze has a class DirectXBase which encapsulates all rendering logic, which is great, but the game logic is in a class MarbleMaze, which derives from this class. It’s nothing that would trip up an experienced developer, but beginners may follow the example. The Simple3DGame (which also demos DirectX and XAML interop) is better structured. The only strange thing in that example is that the render primitive classes are all C++/CX ref classes. This seems to contradict what I remember Herb Sutter saying in a Channel 9 video about sticking to ISO standard C++ where possible and only using C++/CX at the boundaries.

I’m not sure yet whether I’ll try to continue to create a simple demo of my game idea purely in C++, or create a C++/CX interop layer to facilitate writing the game logic in C# – I’m tempted to do this as I’m more productive in C#, and it’ll probably lead to a deeper understanding of WinRT which will be useful even when creating C# XAML applications.

I could have alternatively stayed purely in C# and used SharpDX, or a game framework such as ANX or Ogre (both of which say will soon be supporting XAML), but I was keen on having control of the whole game, C++ upwards – if after profiling any of the C# code any bottlenecks are found, it’ll be easier to re-implement those parts of the code in C++.

As to my experiences of developing the demo: The new version of C++ definitely feels more productive, and feels closer to C#. I love auto (var), lambdas (pretty much the same), for_each (foreach) and range for, and initializer lists. Also, async programming using PPL tasks feels very similar to .NET’s TPL. I’ve had to change mental gears a bit in thinking again about object ownership instead of relying on the garbage collector, and it took way longer than it should for me to figure out how to add a std::unique_ptr into a vector.

I could share the demo code if anyone would find it useful (I need to tidy it up a little bit, and can’t be bothered otherwise) – just post in the comments!