Optimal Control of Double Integrator with Dynamic Programming

I showed in my last blog post how LQR control can stabilize a cart pole around its unstable fixed point.

However, LQR control doesn’t help to swing the cart pole up into the balance point from states which are far away from the fixed point.

The underactuated robotics course discusses energy pumping techniques, to add or remove energy from the system until it reaches the homoclinic orbit (the trajectory which passes through the unstable fixed point) – and when the state is near the fixed point the fixed control scheme is switched to LQR control.

The course shows an alternate approach – using dynamic programming to algorithmically find the time-minimal optimal control from any position in the state-space. The lectures and homework shows an example of the dynamic programming algorithm on the double integrator system, and I replicated this in C# to get confidence that I had understood and implemented the algorithm correctly.

The animations show how the optimal control, and the cost function, are iterated to the converged solution, via value iteration:



One thing that took me a while to figure out, and I didn’t see straight away from the notes, is that as this is an infinite horizon problem, the gamma term is necessary in adding to the previous cost.

A sample trajectory using the analytical solution is shown here:
Using the dynamic programming approach for a state space discretized in x between -10 and 10 with 49 grid and x_dot discretised between -5 and 5 with 49 grid points gives the following:


The trajectory does reach the set point, but it is skewed from the analytical solution, and the acceleration isn’t at -1 on the approach to the target, but is somewhere between -1 and 0.

This is because of the discretisation, the grid points around the switching surface are too large, so the values are being interpolated where they should be saturated at 1 or -1. Changing the state space to be more focused on this trajectory (x between -5 and 5, and x_dot between -2.5 and 2.5) with 199 points, gives a result much closer to the analytical solution:


I haven’t yet coded this up for the cart pole problem, the core of the algorithm is the same, but with more dimensions. The only real part of the algorithm that changes is the volumetric interpolation. I don’t want to recreate Python’s ndarray in C#. I might look at a C++ implementation using Eigen’s tensor class.

The code is here: https://github.com/taumuon/robotics/tree/master/underactuated_notes/DoubleIntegratorDynamicProgramming

The next topic to investigate is reinforcement learning (the solution in Drake uses a Markov Decision Process).


LQR Control of Inverted Pendulum in C++ using Eigen

This blog has been quite for ages as I’ve been spending most of my commute time participating in MOOCS. One course that I found especially interesting, but have not yet found the time to complete, is underactuated robotics. I’m slowly exploring some of the ideas in my own time, and that’s what this blog post is about.

The course explores exploiting the non-linear dynamics of a system, to create more efficient, elegant control schemes. There are lots of interesting videos on youtube of acrobots, wheeled walkers, double inverted pendulums etc.

The first week investigates the non-linear dynamics of the pendulum, and I investigated that using iPython Notebooks here (or in nbviewer).

Python notebooks provide a rich documentation environment; it’s easy to generate charts and animations, but I found the actual writing and debugging of python code in a notebook to be quite painful, so have switched to C++ for the cart pole example.

This blog post will simply cover the LQR control of an inverted pendulum at its unstable fixed point (similar to balancing a pencil on a finger). I’ll hopefully get around to investigating the non-linear swing-up problem in the near future.

I’ve simply copied the below equations out of the underactuated notes for ease of reference (it’s strongly recommended to read the notes).

The equations of motion for a cart pole are:

$$ (m_c + m_p) \ddot{x} + m_p l \ddot{\theta}cos {\theta} – m_pl \dot{\theta}^2 sin {\theta} = f $$

$$ m_p l \ddot{x} cos {\theta} + m_p l^2 \ddot{\theta} + m_p g l sin {\theta} = 0 $$

Directly solving them for $\ddot{x}$ and $\ddot{\theta}$ gives:

$$ \ddot{x} = \frac{1}{m_c + m_p sin^2 {\theta}} [f + m_p sin{\theta}(l \dot{\theta}^2 + g cos{\theta})]$$

$$ \ddot{\theta} = \frac{1}{l(m_c + m_p sin^2 \theta)} [-f cos \theta -m_pl\dot{\theta}^2 cos{\theta}sin{\theta}-(m_c+m_p)g sin{\theta}] $$

I haven’t pulled out the linearised equations, or the derivation of the LQR control, as they are described clearly on the umich website (see the links at the end of this post).

For my implementation code, I solve the LQR K matrix offline using GNU Octave’s Control package. I may at some point investigate if there are any resources on solving the algebraic Riccati equation online, or otherwise calling Octave from C++.

For the simulation, I investigated the cart pole with an initial pole angular displacement of 0.1 radians from the unstable fixed point.

The LQR control applied to the linear system gives the following. The control does well with fast settling time and minimal overshoot.


The LQR control applied to the non-linear system does stabilise the system, but it is more oscillatory. Later lectures of the course do discuss finding where in the state-space the LQR control is applicable, and I may get around to investigating these ideas at some point.


The rest of the blog post will discuss the implementation.

The code simply simulates the system from the initial conditions, finding the desired control input at each step given the current system state. It writes out the resulting values of x and ${\theta}$ to a space-delimited text file.

The plotting is done externally using GNUPlot. Having to round-trip out to an external program is annoying, and I may investigate charting libraries in Qt at some point.

I’m using the Eigen matrix library, which is a header-only library. It’s simple to use, and uses the processor’s SIMD instruction set for performance.

I did originally run into issues with Eigen due to alignment. For SIMD to work, the memory allocations are aligned on a 16-byte boundary. However, the Visual C++ compiler doesn’t maintain alignment on passing parameters into functions by value, and the Eigen documentation recommends passing by const ref.

Passing matrices by const ref into methods is fine, but the issue I hit is that I have a function that returns a lambda, which time steps the system given the current system state, the time delta, and the control input. This allows the linear and non-linear systems to be abstracted away from the generation of the overall output trajectory.

That function can’t generate the matrices on each call, as they’re quite expensive to calculate – it wants to capture the matrices by value, as closing over by reference would obviously reference objects which are destroyed by the time the lambda is called. However, trying to capture by value runs into the same alignment errors.

I tried to pass the CartPoleLinear by value, but it warned about alignment issues, even though it used Eigen’s recommended EIGEN_MAKE_ALIGNED_OPERATOR_NEW define. My first solution was to store each of the matrices in a shared_ptr, to ensure that the lambda kept them alive after the factory function returns. This was nasty – to store Eigen matrices in a shared_ptr requires that a custom allocator is used.

auto GetTimeStepFunctionCartPoleLinear(SystemParams system_params)
    // This is nasty, but Eigen has alignment issues if capturing by value

    CartPoleLinear cartPoleLinear(system_params);
    Eigen::aligned_allocator<Eigen::Matrix4d> a_allocator;
    Eigen::aligned_allocator<Eigen::Matrix<double, 4, 1>> b_allocator;
    Eigen::aligned_allocator<Eigen::Matrix<double, 2, 4>> c_allocator;
    Eigen::aligned_allocator<Eigen::Matrix<double, 2, 1>> d_allocator;

    auto pA = std::allocate_shared<Eigen::Matrix4d>(a_allocator, cartPoleLinear.AMatrix());
    auto pB = std::allocate_shared<Eigen::Matrix<double, 4, 1>>(b_allocator, cartPoleLinear.BMatrix());
    auto pC = std::allocate_shared<Eigen::Matrix<double, 2, 4>>(c_allocator, cartPoleLinear.CMatrix());
    auto pD = std::allocate_shared<Eigen::Matrix<double, 2, 1>>(d_allocator, cartPoleLinear.DMatrix());

    return [=](auto state, auto delta_time, auto u)
        auto A = *pA;
        auto B = *pB;
        auto C = *pC;
        auto D = *pD;

My second solution was to instead of capturing shared_ptrs, instead capture std::arrays which don’t have the alignment problem. The code is still messy having to marshall the data to and from matrices in arrays (and having to create new matrices every time that the lambda is called).

auto GetTimeStepFunctionCartPoleLinear(SystemParams system_params)
    // This is nasty, but Eigen has alignment issues if capturing by value

    std::array<double, 16> a_array;
    std::array<double, 4> b_array;
    std::array<double, 8> c_array;
    std::array<double, 2> d_array;

    CartPoleLinear cart_pole_linear(system_params);

    Eigen::Map<Eigen::Matrix<double, 4, 4>>(a_array.data(), 4, 4) = cart_pole_linear.AMatrix();
    Eigen::Map<Eigen::Matrix<double, 4, 1>>(b_array.data(), 4, 1) = cart_pole_linear.BMatrix();
    Eigen::Map<Eigen::Matrix<double, 2, 4>>(c_array.data(), 2, 4) = cart_pole_linear.CMatrix();
    Eigen::Map<Eigen::Matrix<double, 2, 1>>(d_array.data(), 2, 1) = cart_pole_linear.DMatrix();

    return [=](auto state, auto delta_time, auto u)
        Eigen::Matrix<double, 4, 4 >> A(a_array.data());
        Eigen::Matrix<double, 4, 1 >> B(b_array.data());
        Eigen::Matrix<double, 2, 4 >> C(c_array.data());
        Eigen::Matrix<double, 2, 1 >> C(d_array.data());

The eventual solution was when I realised that the alignment issues are only in 32 bit builds. In 64 bit builds, Visual C++ aligns the parameters on 16 byte boundaries, so the sane code works. I’ve removed the 32 bit builds from the solution, but if I cared about 32 bit builds I’d disable the Eigen alignment instead of the nasty workaround above. The code is now simply:

auto GetTimeStepFunctionCartPoleLinear(SystemParams system_params)
  CartPoleLinear cart_pole_linear(system_params);

  auto A = cart_pole_linear.AMatrix();
  auto B = cart_pole_linear.BMatrix();
  auto C = cart_pole_linear.CMatrix();
  auto D = cart_pole_linear.DMatrix();

  return [=](auto state, auto delta_time, auto u)
    Eigen::Vector4d state_vector(state.data());

    Eigen::Matrix<double, 4, 1> state_dot = (A * state_vector) - (B * u);

    auto new_state = state_vector + (state_dot * delta_time);
    return State{ new_state[0], new_state[1], new_state[2], new_state[3] };

The GNUPlot commands for plotting are:

set terminal png size 400,300 enhanced font "arial,20"
set output "linear.png"
plot 'nonlinear.txt' using 1:2 title 'x' with lines, 'nonlinear.txt' using 1:3 title 'theta' with lines

I intend to investigate the swing up problem next. Another feature I miss from iPython Notebooks is JSAnimation, where it’s quite easy to produce a simple animation of the system. To be fair, it’s not that easy to produce a more complex animation. I’m planning to look at drawing using Cairo and then produce an animation from individual frames.

The code is available on github.