Realtime soft body physics with Finite Elements

I’ve had a play with implementing soft body physics as described in the books Physics-based animation and Game Programming Pearls. (BTW I’ve no idea what’s going on with the price of Physics Based Animation on Amazon – I have a hardback copy available if anyone wants to buy it at that price :D)

The demo scene is a wall of cubes, anchored at the base of the wall. Each cube is made up of six tetrahedra. A force is applied to the top-right node causing the nodes to move and the elements to deform. The stiffness of the blocks is relatively low, and I haven’t applied any damping, so the resulting behaviour is as if the wall is made of jelly.

The description of the algorithm is described pretty similarly in both books, but there are enough subtleties it’s useful to cross-reference between the books where things are unclear. My code most closely follows Physics Based Pearls.

The code makes heavy use of the Eigen matrix library, and its conjugate gradient solver. As the Eigen library makes heavy use of proxy objects via template expressions, I did get caught out more than a couple of times with assigning the result of a matrix operation to an auto-declared variable.

The rendering is done in OpenGL ES 3.0, so that it runs on desktop graphics supporting OpenGL ES (tested on Intel Skylake), and more recent Android phones (tested on an LG G4).

I used Visual Studio’s cross-platform solution template to create the solution, so the bulk of the code is common between the desktop and mobile, with only a thin platform specific layer.

The code runs fluidly on the desktop with 192 elements and 90 nodes, but runs an order of magnitude slower on mobile.

Code is available on github:

LQR Control of Inverted Pendulum in C++ using Eigen

This blog has been quite for ages as I’ve been spending most of my commute time participating in MOOCS. One course that I found especially interesting, but have not yet found the time to complete, is underactuated robotics. I’m slowly exploring some of the ideas in my own time, and that’s what this blog post is about.

The course explores exploiting the non-linear dynamics of a system, to create more efficient, elegant control schemes. There are lots of interesting videos on youtube of acrobots, wheeled walkers, double inverted pendulums etc.

The first week investigates the non-linear dynamics of the pendulum, and I investigated that using iPython Notebooks here (or in nbviewer).

Python notebooks provide a rich documentation environment; it’s easy to generate charts and animations, but I found the actual writing and debugging of python code in a notebook to be quite painful, so have switched to C++ for the cart pole example.

This blog post will simply cover the LQR control of an inverted pendulum at its unstable fixed point (similar to balancing a pencil on a finger). I’ll hopefully get around to investigating the non-linear swing-up problem in the near future.

I’ve simply copied the below equations out of the underactuated notes for ease of reference (it’s strongly recommended to read the notes).

The equations of motion for a cart pole are:

$$ (m_c + m_p) \ddot{x} + m_p l \ddot{\theta}cos {\theta} – m_pl \dot{\theta}^2 sin {\theta} = f $$

$$ m_p l \ddot{x} cos {\theta} + m_p l^2 \ddot{\theta} + m_p g l sin {\theta} = 0 $$

Directly solving them for $\ddot{x}$ and $\ddot{\theta}$ gives:

$$ \ddot{x} = \frac{1}{m_c + m_p sin^2 {\theta}} [f + m_p sin{\theta}(l \dot{\theta}^2 + g cos{\theta})]$$

$$ \ddot{\theta} = \frac{1}{l(m_c + m_p sin^2 \theta)} [-f cos \theta -m_pl\dot{\theta}^2 cos{\theta}sin{\theta}-(m_c+m_p)g sin{\theta}] $$

I haven’t pulled out the linearised equations, or the derivation of the LQR control, as they are described clearly on the umich website (see the links at the end of this post).

For my implementation code, I solve the LQR K matrix offline using GNU Octave’s Control package. I may at some point investigate if there are any resources on solving the algebraic Riccati equation online, or otherwise calling Octave from C++.

For the simulation, I investigated the cart pole with an initial pole angular displacement of 0.1 radians from the unstable fixed point.

The LQR control applied to the linear system gives the following. The control does well with fast settling time and minimal overshoot.


The LQR control applied to the non-linear system does stabilise the system, but it is more oscillatory. Later lectures of the course do discuss finding where in the state-space the LQR control is applicable, and I may get around to investigating these ideas at some point.


The rest of the blog post will discuss the implementation.

The code simply simulates the system from the initial conditions, finding the desired control input at each step given the current system state. It writes out the resulting values of x and ${\theta}$ to a space-delimited text file.

The plotting is done externally using GNUPlot. Having to round-trip out to an external program is annoying, and I may investigate charting libraries in Qt at some point.

I’m using the Eigen matrix library, which is a header-only library. It’s simple to use, and uses the processor’s SIMD instruction set for performance.

I did originally run into issues with Eigen due to alignment. For SIMD to work, the memory allocations are aligned on a 16-byte boundary. However, the Visual C++ compiler doesn’t maintain alignment on passing parameters into functions by value, and the Eigen documentation recommends passing by const ref.

Passing matrices by const ref into methods is fine, but the issue I hit is that I have a function that returns a lambda, which time steps the system given the current system state, the time delta, and the control input. This allows the linear and non-linear systems to be abstracted away from the generation of the overall output trajectory.

That function can’t generate the matrices on each call, as they’re quite expensive to calculate – it wants to capture the matrices by value, as closing over by reference would obviously reference objects which are destroyed by the time the lambda is called. However, trying to capture by value runs into the same alignment errors.

I tried to pass the CartPoleLinear by value, but it warned about alignment issues, even though it used Eigen’s recommended EIGEN_MAKE_ALIGNED_OPERATOR_NEW define. My first solution was to store each of the matrices in a shared_ptr, to ensure that the lambda kept them alive after the factory function returns. This was nasty – to store Eigen matrices in a shared_ptr requires that a custom allocator is used.

auto GetTimeStepFunctionCartPoleLinear(SystemParams system_params)
    // This is nasty, but Eigen has alignment issues if capturing by value

    CartPoleLinear cartPoleLinear(system_params);
    Eigen::aligned_allocator<Eigen::Matrix4d> a_allocator;
    Eigen::aligned_allocator<Eigen::Matrix<double, 4, 1>> b_allocator;
    Eigen::aligned_allocator<Eigen::Matrix<double, 2, 4>> c_allocator;
    Eigen::aligned_allocator<Eigen::Matrix<double, 2, 1>> d_allocator;

    auto pA = std::allocate_shared<Eigen::Matrix4d>(a_allocator, cartPoleLinear.AMatrix());
    auto pB = std::allocate_shared<Eigen::Matrix<double, 4, 1>>(b_allocator, cartPoleLinear.BMatrix());
    auto pC = std::allocate_shared<Eigen::Matrix<double, 2, 4>>(c_allocator, cartPoleLinear.CMatrix());
    auto pD = std::allocate_shared<Eigen::Matrix<double, 2, 1>>(d_allocator, cartPoleLinear.DMatrix());

    return [=](auto state, auto delta_time, auto u)
        auto A = *pA;
        auto B = *pB;
        auto C = *pC;
        auto D = *pD;

My second solution was to instead of capturing shared_ptrs, instead capture std::arrays which don’t have the alignment problem. The code is still messy having to marshall the data to and from matrices in arrays (and having to create new matrices every time that the lambda is called).

auto GetTimeStepFunctionCartPoleLinear(SystemParams system_params)
    // This is nasty, but Eigen has alignment issues if capturing by value

    std::array<double, 16> a_array;
    std::array<double, 4> b_array;
    std::array<double, 8> c_array;
    std::array<double, 2> d_array;

    CartPoleLinear cart_pole_linear(system_params);

    Eigen::Map<Eigen::Matrix<double, 4, 4>>(, 4, 4) = cart_pole_linear.AMatrix();
    Eigen::Map<Eigen::Matrix<double, 4, 1>>(, 4, 1) = cart_pole_linear.BMatrix();
    Eigen::Map<Eigen::Matrix<double, 2, 4>>(, 2, 4) = cart_pole_linear.CMatrix();
    Eigen::Map<Eigen::Matrix<double, 2, 1>>(, 2, 1) = cart_pole_linear.DMatrix();

    return [=](auto state, auto delta_time, auto u)
        Eigen::Matrix<double, 4, 4 >> A(;
        Eigen::Matrix<double, 4, 1 >> B(;
        Eigen::Matrix<double, 2, 4 >> C(;
        Eigen::Matrix<double, 2, 1 >> C(;

The eventual solution was when I realised that the alignment issues are only in 32 bit builds. In 64 bit builds, Visual C++ aligns the parameters on 16 byte boundaries, so the sane code works. I’ve removed the 32 bit builds from the solution, but if I cared about 32 bit builds I’d disable the Eigen alignment instead of the nasty workaround above. The code is now simply:

auto GetTimeStepFunctionCartPoleLinear(SystemParams system_params)
  CartPoleLinear cart_pole_linear(system_params);

  auto A = cart_pole_linear.AMatrix();
  auto B = cart_pole_linear.BMatrix();
  auto C = cart_pole_linear.CMatrix();
  auto D = cart_pole_linear.DMatrix();

  return [=](auto state, auto delta_time, auto u)
    Eigen::Vector4d state_vector(;

    Eigen::Matrix<double, 4, 1> state_dot = (A * state_vector) - (B * u);

    auto new_state = state_vector + (state_dot * delta_time);
    return State{ new_state[0], new_state[1], new_state[2], new_state[3] };

The GNUPlot commands for plotting are:

set terminal png size 400,300 enhanced font "arial,20"
set output "linear.png"
plot 'nonlinear.txt' using 1:2 title 'x' with lines, 'nonlinear.txt' using 1:3 title 'theta' with lines

I intend to investigate the swing up problem next. Another feature I miss from iPython Notebooks is JSAnimation, where it’s quite easy to produce a simple animation of the system. To be fair, it’s not that easy to produce a more complex animation. I’m planning to look at drawing using Cairo and then produce an animation from individual frames.

The code is available on github.


Bullet Physics and DirectX 11 on Windows 8 Metro

I’ve got a game idea which I think may work on a tablet so I’ve created the following ‘demo’. I’ve hooked together the Bullet Physics engine with some 3D graphics, and the falling wall of cubes could pretty much be the canonical “Hello, World” equivalent for a 3D game. I’m not sure how much time I’ll have to pursue my game idea, but it should in the worst case be a route to learning some new technologies.

The demo won’t win any awards for its graphics, and the video is pretty poor quality (I recorded the screen with my camera instead of using any screen capturing software).

Falling cubes

Microsoft seem to be definitely pushing for the use of C++ for game creation with Windows 8, with no announcement of a new version of XNA, but that seems to make sense; C++ is closer to the metal than C# and will get more performance out of lower power ARM tablets. Also, there’s quite a buzz with the C++ renaissance with C++11, and modern-style C++. The corresponding new technologies in VS 2010 (PPL) and VS 11 (auto-vectorization, C++ AMP) are especially cool.

I’m also keen to get back into C++ – the past year or so I’ve been working purely in C#, but all my prior experience was using a mixture of C# and C++, and I do miss C++ (even though I am more productive in C#). Also, my previous roles were generally in a modern style (use of STL, smart pointers etc.) but my most recent experience was “C with Classes” style code, so I’ve been relearning modern style as well as picking up the new C++11 features.

Building Bullet as a static lib was pretty easy, I just had to switch it over to use the Multi-threaded DLL c-runtime. Defining WINAPI_FAMILY=WINAPI_PARTITION_APP to detect any non-metro API usages revealed the usage of GetTickCount, which is unavailable on Metro apps, so I just turned off profiling by defining BT_NO_PROFILE 1 in btQuickprof.h

The Windows 8 samples including a couple of games which are decent guides on using DirectX 11. These demos aren’t supposed to be examples of a whole game engine, and don’t necessarily follow best practices: Marble Maze has a class DirectXBase which encapsulates all rendering logic, which is great, but the game logic is in a class MarbleMaze, which derives from this class. It’s nothing that would trip up an experienced developer, but beginners may follow the example. The Simple3DGame (which also demos DirectX and XAML interop) is better structured. The only strange thing in that example is that the render primitive classes are all C++/CX ref classes. This seems to contradict what I remember Herb Sutter saying in a Channel 9 video about sticking to ISO standard C++ where possible and only using C++/CX at the boundaries.

I’m not sure yet whether I’ll try to continue to create a simple demo of my game idea purely in C++, or create a C++/CX interop layer to facilitate writing the game logic in C# – I’m tempted to do this as I’m more productive in C#, and it’ll probably lead to a deeper understanding of WinRT which will be useful even when creating C# XAML applications.

I could have alternatively stayed purely in C# and used SharpDX, or a game framework such as ANX or Ogre (both of which say will soon be supporting XAML), but I was keen on having control of the whole game, C++ upwards – if after profiling any of the C# code any bottlenecks are found, it’ll be easier to re-implement those parts of the code in C++.

As to my experiences of developing the demo: The new version of C++ definitely feels more productive, and feels closer to C#. I love auto (var), lambdas (pretty much the same), for_each (foreach) and range for, and initializer lists. Also, async programming using PPL tasks feels very similar to .NET’s TPL. I’ve had to change mental gears a bit in thinking again about object ownership instead of relying on the garbage collector, and it took way longer than it should for me to figure out how to add a std::unique_ptr into a vector.

I could share the demo code if anyone would find it useful (I need to tidy it up a little bit, and can’t be bothered otherwise) – just post in the comments!

Faster boids

I used the built in performance profiler in VSTS to look at speeding the boids up (and there seems to be a bug in the profiler – when memory profiling was turned on, it seemed unable to resolve symbols, so I switched to the CLRProfiler to look at the memory being used).

Most of the time in the application was spent in calculating the boids’ steering behaviours.
The optimisations here were to try to reduce the time spent in calculating the forces – one optimisation was that instead of each behaviour (cohesion, separation, alignment) each scanning through the entire set of boids, the results of one of the behaviours could be used by both others.

Secondly, the cohesion force was not updated on each frame – the value is cached for 10 iterations. This should not be a fixed value – there should be some heuristic to determine how it should be varied once work is in to measure the “quality” of the flock.

This boosted the frame rate from approx 3fps to 10fps (dropping to 5 fps after some time).

Next came some micro-optimisations.

Changing the Vector to be a struct instead of a class (to avoid over-stressing the garbage collector) resulted in 14fps dropping to 8 fps after running.

Other micro-optimisations that I tried, but didn’t result in any real performance difference included:

Making the trail generator to use a fixed size array (instead of adding/removing from a list).

Removing IEulerRigidBody, calling onto objects directly instead of through an interface.

Changing all doubles to be floats (didn’t actually keep this change in the code).

For Scene8, the drawing code was the real bottleneck, changing the gluSphere to be drawn using VertexBufferObjects resulted in double the frame rate (and it isn’t particularly optimised – could use glDrawElements).

For Scene7, it’s limited by the O(n2) collision detection (this is the next thing to tackle).


An improvement of the flocking behaviour in Jabuka is available.

There are now very simplistic birds drawn, to give an idea of the boid’s orientation.

The separation behaviour has been fixed. The behaviour was garbage before, the force away from another boid was proportional to the distance away from it. Now the force increases the nearer another boid is.

The boid’s orientation is taken as the smallest angle between being aligned with the World’s x axis, and the boid’s velocity. The roll doesn’t behave realistically (see scene 13), but the boid’s behaviour isn’t realistic. A boid would really change its velocity by adjusting its orientation (change the wing shape to apply a torque to allow it to rotate…)

This paper describes how the steering behaviours should change the intention of the boid, instead of providing actual forces onto the boid.

A simple particle system has been introduced to allow the path of a few individual boids to be more easily seen.

Running under the Visual Studio profiler, most of the time was seen to be spent finding the flockmates in proximity of the boid.

The same calculation was being done twice, for both the cohesion and alignment behaviours. Adding a ProximityFinderCache increased the frame rate by 50%, with no change in the calculations. Adjusting the code so that the boids in proximity are only calculated on every third iteration added another 20% to the framerate (giving 15fps with only two TrailGenerators attached).


For not having another Jabuka post yet… Well, I’ve been really busy at my previous employers getting upto speed on various new technologies, then as you may have read on my other blog, I’ve been busy preparing for a new job (I might blog a bit about that at some point), and work always takes priority. I did have to create an interesting small WPF 3D application as part of my application process (I’ll tart this up and get it online when I get chance).

I haven’t given up on Jabuka, I spent a lot of time over Christmas researching different constraint methods (LCP solvers, PGS solvers and Sequential Impulses), and will playing around with this as soon as I get a chance to breathe (oh, and I finally packaged up my NUnit data-driven testing extension, here).

Otherwise, my new years resolution is to play with F# – I attended a Next Gen Usergroup talk last year at Microsoft’s Cambridge site by Don Syme. Since then I’ve had time to do some reading around, but not to actually create anything in anger. Obviously, some functional ideas have influenced C# (lambda expressions etc.) but I want to see how working in a functional mindset might chance some of my ideas of designing code.

There’s been lots of blogging recently on immutability in C# (see here and here) recently – in functional languages everything is immutable, and obviously not having immutable state gets around guarding state with multiple threads, it’s just the overall program design I can’t picture yet (I’m used to thinking of looking for common interfaces, design patterns etc.) There’s also new parallel/Concurrent resources available from Microsoft which do fit nicely into all this (see F#’s asynchronous workflows).

A change of approach/jabuka 0.6

As mentioned on the Jabuka website, I’m moving away from a tutorial-based approach, mainly because of lack of time. It’s much quicker to implement new code and then blog retrospectively, highlighting new features and interesting code/design decisions. In the tutorial based approach, making a change/bug fix in an earlier tutorial forces all later tutorials to be updated.

That being said, it’s worth discussing changes are in the latest version, 0.6, of Jabuka.
This code introduces a couple of simple steering behaviours, and flocking boids. Given the roadmap, why did I suddenly go off on a total tangent and implement boids? Well, apart from flocking behaviour being especially cool (look on youtube for the Carling Belong commercial, or starling flocks), I wanted to stress the engine (I use that term very loosely given the current functionality of Jabuka) as I’m itching to demonstrate multithreading and performance enhancements. This version of Jabuka has shown that it is still too early to look at performance, however, as the application still has some fundamental flaws to be addressed.

The flocking application implements a flock (based on the algorithms on Chris Reynolds website, and roughly based on the algorithms in the Killer Game Programming in Java book). The changes to Jabuka were that the force for a rigid body is obtained in the Update() method from introducing an IForceProvider interface, implemented by various behaviours.
If you look at the implementation of the steering behaviours, you can see that there is a large number of classes implementing this interface, each performing a simple operation such as aggregation/limitation. This does lead to a proliferation of classes, but I’m happy to live with it for now as it makes unit testing easier, and may lead to more easily changing behaviour dynamically at runtime.

The Update() method calling on the IForceProvider is not perfect in this case – the objects are iterated over, and each object’s Update() method calls on its IForceProvider. For the steering behaviours, the behaviour compares the body’s location to its neighbours to calculate its force and update its position. The problem is that the resulting locations are dependent on the order in which the bodies are updated. This has problems in measuring the flocking quality (more on this later), and it may be desired to update the steering forces independent of Update(), so we don’t do this in every time step (as it’s an expensive operation – O(2)?). Also, having the state of an object dependent on every other object on every timestep will get really messy when we introduce multithreading. Having each body’s Update() method independent of any other means that it’s much easier to parallelise/vectorise/split into multiple threads – it’s likely we’d want to avoid making every IEulerRigidBody thread safe, and would want to implement the thread safety at a higher level (though I won’t jump the gun, we’ll get there soon enough).

The sort of changes above hint that it may be better to have a Physics Manager class, that owns the Collision object, the rigid bodies, and also has knowledge of the IForceProviders and manages the overall sequence of operation.

The larger problem with having the currently flocking behaviour, including collision detection, is that I haven’t yet implemented resting contacts (well, any contact forces) apart from collisions, and as objects are forced towards each other, the simulation can halt. I’ve temporarily fixed this by introducing a scene where the boids don’t have any collision detection. I’ve also introduced a scene which demonstrates the halting, with only two spheres.

As to steering behaviours and flocking itself… I’ve introduced a scene for the seek behaviour, demonstrating that having the steering force directed towards the target is insufficient, the velocity tangential to the targed needs to be taken into account. The idea is to find the optimal trajectory towards the target. Reynolds copes with this by providing a force such that the body’s velocity is adjusted to be the maximum velocity towards the target. This is better, but does mean that the body needs a concept of its maximum velocity (this works OK on Earth, as air friction gives a terminal velocity, but in space this is arbitrary). Also, the maximum velocity may vary – the maximum velocity into the wind is less than having a following wind, and may depend on a body’s shape (or its orientation with respect to its velocity).

Looking at the arrive behaviour, where the speed is arbitrarily slowed down, the first thought that comes to mind is that a PID control towards the target would be desired. However, the problem is that the velocity tangential/perpendicular and distance from the target all need to be taken into account – a more complicated control scheme would need to be used, and that may be too computationally expensive.

The current boid implementation currently uses the same proximity distance for all individual behaviours (separation, alignment, cohesion). It may be that for separation, a smaller proximity should be used. Also, currently, I’m using a simple sphere without taking into account the blind spot behind the boid.

There are lots of tweaks to be made (and a Genetic Algorithm could be used to change these dynamically), but many of them are pointless without a method of measuring the quality of the flock. There is an interesting paper on that here. Some background on the flock theory / the underlying physics of flocking, can be found here, and here.