I am interested to play around with inverted pendulum environment on OpenAI’s gym. But, instead of training an RL agent to balance the pendulum, I try to balance the pendulum with optimal control.

*Pendulum free-falling with balancing mechanism that moves randomly*

# Dynamics of the System

Let’s start by deriving the dynamics of the system `CartPole-v0`

. Only the following variables are used in the code:

- Mass of the cart, we can call it
`M`

, where M = 1.0 - Mass of the pendulum,
`m`

= 0.1 - Gravity,
`g`

= 9.8 - Length of the pendulum,
`L`

= 0.5

Other variables such as damping or friction is not included in the dynamics by the developer, which fortunately will make our modelling much simpler. Those four variables are illustrated in the image below. In the environment, `F`

is held constant at 10.0. The only actions we can apply to the cart is either `-F`

or `F`

. This is definitely not ideal as we expect to have a control action in continous space.

The states of our dynamical system is represented by vector `X`

below. It’s a 2 degree of freedom system, position on `x`

and the angle of the pendulum `theta`

.

Now, we have four coupled ODE(s). To make the post short, I won’t cover the linearization of the system. We can use something like Euler-Lagrange equations or Hamilton equation. Through `Df/Dx`

evaluated at upward position, we can get,

We know `u = -KX`

, and in this case, it’s equal to the force applied to the cart. We then check if the matrix A and B are controllable (more about controllability can be found here). Yes, it is. Our `ctrb`

matrix is full rank.

To control the whole system, we can simply find K which place the eigenvalues of the system at `<= -1.0`

. I tried random number between `-1.0`

and `-2.0`

and found the system is stable. But let’s do it on a more elegant way to find the K.

# Optimal Control

I use Linear Quadratic Regulator (LQR) to find the best K. The notion of the technique is to find K which minimize

where `Q`

is an `n x n`

matrix which represent something like a penalty on how bad our system being far away from the set point intended (being stable at upward position). `Q`

is a positive semidefinite matrix. The notion of `R`

is the same, but it penalizes the control action. In this environment, `R`

won’t have big effect since the actuation of cart is kind of *discrete*. Anyway, `Q`

and `R`

are defined below,

The control is simulated on the environment below. It can stabilize the pendulum well, although it’s a slowly drifting to left (see the comparison with the first simulation). To obtain the best control performance possible, we need more flexibility on the actions, instead of just `-10`

or `10`

unit `F`

.

*Pendulum has been stabilized at upward position*