Kalman Filters Part 1

State Transitions

Exercise 3-1 from Probabilistic Robotics*

This notebook develops a Kalman filter state transition per exercise 3-1 (page 81) from Probabilistic Robotics by Thrun, Bergard, and Fox. In addition to analytical solutions, I've included a Monte Carlo simulation of the scenario in exercise 3-1, coded in Python. I've also included graphs and tables that compare the analytical and simulation results. The measurement update for the Kalman filter is addressed in the next notebook for exercise 3-2.

1. Scenario Description

Exercise 3-1

  1. In this and the following exercise, you are asked to design a Kalman filter for a simple dynamical system: a car with linear dynamics moving in a linear environment. Assume $\Delta T = 1$ for simplicity. The position of the car at time $t$ is given by $x_t$. Its velocity is $\dot x_t$, and its acceleration is $\ddot x_t$.Suppose the acceleration is set randomly at each point in time, according to a Gaussian with zero mean and covariance $\sigma^2 = 1$.

Discussion

Let's break this down. The term Gaussian refers to a normal probability distribution:

$$ \tag{1} p(x) = \frac{1}{\sqrt{2\pi\sigma^2}} exp\left(-\frac{(x-\mu)^2}{2\sigma^2}\right) $$

Since $\mu = 0$ and $\sigma = 1$, the probability that the acceleration at time $t$ will be $\ddot{x_t}$ simplifies to:

$$ \tag{2} p(\ddot x_t) = \frac{1}{\sqrt{2\pi}} exp\left(-\frac{\ddot x_t^2}{2}\right) $$

2. Markovian State Vector

Exercise 3-1(a)

(a) What is the minimal state vector for the Kalman filter (so that the resulting system is Markovian)?</font>


Solution

The state vector is a set of parameters that describes the system at time $t$. For the state vector to be Markovian, it must completely describe the system, such that the values of the state vector at times earlier than $t$ are unnecessary. In other words, knowing the values of the state vector at earlier times does not improve our ability to determine future states.

With a small amount of analysis, we can conclude that the state vector consists only of the position $x_t$ and the velocity $\dot x_t$. Assuming that the acceleration changes instantaneously at each time step and is constant for the duration of the step, then according to basic physics the values $x_t$ and $\dot x_t$ are related to $x_{t-1}$ and $\dot x_{t-1}$ by the following equations:

$$ \tag{3} \dot x_t = \dot x_{t-1} + \ddot x_t $$

$$ \tag{4} x_t = x_{t-1} + \dot x_{t-1} + \frac{\ddot x_t}{2} $$

Both equations (3) and (4) take advantage of $\Delta T$ always being equal to 1. Equation (3) simply states that the velocity at time $t$ equals the sum of the velocity at time $t-1$ and the acceleration at time $t$. Equation (4), the formula for position, is a bit trickier. Assuming that the acceleration $\ddot x_t$ is constant from time $t-1$ to time $t$, the change in position over one unit of time will be equal to half the acceleration. From equations (3) and (4), we can see that the state vector must include $x_t$ and $\dot x_t$.

But is acceleration included in the state vector? Note that equations (3) and (4) only depend on acceleration at the current time $t$, but not on acceleration at time $t-1$. If you are still not convinced, review equation (2), the Gaussian probability of observing any particular value of $\ddot x_t$. Equation (2) has no reference to $\ddot x_{t-1}$! It only depends on the the value of acceleration at the current time. Knowing the value of acceleration at earlier times is pointless with respect to predicting acceleration at future times.

In summary, the state vector, $\vec x_t$ is has length 2 and consists only of position and velocity at time $t$:

$$ \tag{5} \vec x_t = \left\lgroup \begin{matrix}x_t \cr \dot x_t \end{matrix} \right\rgroup $$

3. State Transition Probability

Exercise 3-1(b)

(b) For your state vector, design the state transition probability $p(x_t \mid u_t, x_{t-1})$. Hint: this transition function will possess linear matrices $A$ and $B$ and a noise covariance $R$ (c.f., Equation (3.4) and Table 3.1).


Solution

A Kalman filter relies on the state transition probability equation to calculate the value of $\vec x_t$ from $\vec x_{t-1}$:

$$ \tag{6} \vec x_t = \mathbf{A}_t \vec x_{t-1} + \mathbf{B}_t \vec u_t + \vec \epsilon_t $$

  • The values $\vec x_t$ and $\vec x_{t-1}$ represent the state vector at times $t$ and $t-1$. Per equation (5), the state vector has length 2, consisting of velocity and acceleration.
  • The variable $\vec \epsilon_t$ is a Gaussian random variable that models the uncertainty in the robot's state transition. It is the same length as the state vector. In this example, $\vec \epsilon_t$ contains the acceleration terms from equations (3) and (4).
  • We have not yet discussed $\vec u_t$. The value $\vec u_t$ is the control vector. It can be any length and represents the control actions applied to the robot at time $t$. In this example we are not sending any instructions to the robot, so $\vec u_t$ is always 0. $\mathbf{B}_t$ is an $n \times m$ matrix where $n$ is the length of the state vector and $m$ is the length of the control vector. $\mathbf{B}_t$ defines how the elements of the control vector modify the elements in the state vector. In this example, the robot does not accept any instructions (i.e., controls), so all elements of $\mathbf{B}_t$ are zero. We will ignore $\vec u_t$ and $\mathbf{B}_t$ for the remainder of this notebook.
  • $\mathbf{A}_t$ is a square matrix of size $n \times n$ where $n$ is the length of the state vector (2 in this example). Multiplying $\mathbf{A}_t$ by the state vector at time $t-1$ will calculate the state vector at the subsequent time $t$, ignoring any instructions provided to the robot ($\vec u_t$) or random noise ($\vec \epsilon_t$).

To answer this exercise, we choose values of $\mathbf{A}_t$, $\mathbf{B}_t$, and $\vec \epsilon_t$ that result in equation (6) being identical to equations (3) and (4): $$ \tag{7} \mathbf{A}_t = \left\lgroup \begin{matrix}1 & 1 \cr 0 & 1\end{matrix} \right\rgroup \\ \mathbf{B}_t = 0 \\ \vec \epsilon_t = \ddot x_t \left\lgroup \begin{matrix}1 \cr \frac{1}{2} \end{matrix}\right\rgroup $$

You can verify that the values in equation (7) are correct by inserting them into equation (6) and verifying that the result is equivalent to equations (3) and (4).

But wait, we're not done yet. We still need to calculate the value of $\mathbf{R}$.

$\mathbf{R}$ is the covariance matrix for the Gaussian random variable $\vec \epsilon_t$. A covariance matrix is an $n \times n$ square matrix where $n$ is the length of the state vector $\vec x_t$ and whose diagonal elements are the variance of each element of the random noise vector $\vec \epsilon_t$. The non-diagonal elements $r_{ij}$ are the covariances between elements $i$ and $j$ of the state vector. The covariance matrix is symmetric: element $r_{ij}$ = element $r_{ji}$.

The first element of the noise vector is the acceleration $\ddot x_t$. We were given the variance for the acceleration in the problem definition: ${\sigma_{\ddot x}}^2 = 1$.

The second element of the noise vector is a linear function (1/2) of the first element. Consequently we can calculate the variance of the second element from the variance of the first element using the formula:

$${\sigma_2}^2 = V(aX) = \sigma_{aX} = a^2 {\sigma_X}^2 = {\frac{1}{2}}^2 {\sigma_1}^2 = \frac{1}{4}$$

The final value we need for the matrix $\mathbf{R}$ is the covariance between the two elements of the noise vector. Since the second element is precisely determined by the first element, we know that the correlation $\rho = 1$. Covariance and correlation are related by the following equation:

$$ \tag{8} \rho = \frac{\rm cov}{\sigma_1 \sigma_2} \\ \rm cov = \rho \sigma_1 \sigma_2 = 1 \times 1 \times \frac{1}{2} = \frac{1}{2} $$

Consequently: $$ \tag{9} \mathbf{R} = \left\lgroup \begin{matrix}1 & \frac{1}{2} \cr \frac{1}{2} & \frac{1}{4} \end{matrix} \right\rgroup $$

4. State Prediction at Times 0 Through 5

Exercise 3-1(c)

(c) Implement the state prediction step of the Kalman filter. Assuming we know at time $t = 0$, $x_0 = \dot x_0 = \ddot x_0 = 0$. Compute the state distributions for times 1, 2, ..., 5.


Solution

The robot's internal knowledge about the state of its environment is called its belief. Its belief following the calculation of the state transition, but before incorporating updates for external measurements, is called a prediction.

Time t = 0

Since we know the position, velocity, and acceleration at time $t = 0$, the state distribution is not a Gaussian at time $t = 0$. It's a discrete probability distribution with all probability located at a single point:

$$ \tag{10} p(\vec x_0) = \left\lgroup \begin{matrix}1: & x = 0 \: \mathsf{and} \: \dot x = 0 \cr 0: & x \neq 0 \: \mathsf{or} \: \dot x \neq 0 \end{matrix} \right\rgroup $$

Kalman filter algorithms represent the state at time $t$ as a mean vector $\vec \mu_t$ and a covariance matrix $\Sigma_t$. At time $t = 0$:

$$ \tag{11} \vec \mu_0 = \left\lgroup \begin{matrix}0 \cr 0 \end{matrix} \right\rgroup \\ \mathbf{\Sigma}_0 = \left\lgroup \matrix{0 & 0 \cr 0 & 0} \right\rgroup $$

Time t = 1

The Kalmon filter algorithm calculates the mean vector at time $t$ from the mean vector at time $t-1$ using the following equation (the bar above $\mu$ indicates it is a prediction that does not incorporate a measurement at time $t$):

$$ \tag{12} \bar{\vec \mu_t} = \mathbf A_t \vec \mu_{t-1} + \mathbf B_t \vec u_t $$

Since $\vec u_t = 0$ for all values of $t$ and $\vec \mu_0 = 0$, we can conclude that $\mu_t$ will be equal to $0$ for all values of time $t$.

Covariance at time $t$ is calculated from the covariance at time $t - 1$ per the following equation:

$$ \tag{13} \bar{\mathbf{\Sigma}_t} = \mathbf A_t \mathbf \Sigma_{t-1} \mathbf A_t^T + \mathbf R $$

Since $\mathbf{\Sigma}_0 = 0$, then $\mathbf A_t \mathbf \Sigma_{t-1} \mathbf A_t^T = 0$ as well, which means:

$$ \tag{14} \bar{\mathbf{\Sigma}_1} = \mathbf{R} = \left\lgroup \matrix{\frac{1}{4} & \frac{1}{2} \cr \frac{1}{2} & 1} \right\rgroup $$

Time t = 2

$\bar{\mathbf{\Sigma}_2}$:

Since I'm too lazy to multiply matrices by hand, we'll use Python to calculate $\mathbf{\Sigma}_2$. First, we'll create the applicable matrices in Python. The Python variables will be R for $\mathbf{R}$, A for $\mathbf{A}$, and Sigma_n for $\mathbf{\Sigma}_n$ where n = 1, 2, 3, ... .

In [1]:
# Import the Sympy Python package for symbolic mathematics
import sympy
sympy.init_printing()  # Attractively format matrices

# Create matrix R: covariance of random noise.
R = sympy.Matrix([[1/4, 1/2], [1/2, 1]])
R
Out[1]:
$$\left[\begin{matrix}0.25 & 0.5\\0.5 & 1\end{matrix}\right]$$
In [2]:
# Create matrix 'A' for state transitions.
A = sympy.Matrix([[1, 1],[0, 1]])
A
Out[2]:
$$\left[\begin{matrix}1 & 1\\0 & 1\end{matrix}\right]$$

Per equation (13), $\mathbf{\Sigma}_1 = \mathbf{R}$:

In [3]:
Sigma_1 = R

Now we'll calculate $\mathbf{\Sigma}_2$ using equation (12). Note that in the Python Sympy module, A.T is the transpose of matrix A.

In [4]:
Sigma_2 = A * R * A.T + R
Sigma_2
Out[4]:
$$\left[\begin{matrix}2.5 & 2.0\\2.0 & 2\end{matrix}\right]$$

Time t = 3

$\bar{\mathbf{\Sigma}_3}$:

In [5]:
Sigma_3 = A * Sigma_2 * A.T + R
Sigma_3
Out[5]:
$$\left[\begin{matrix}8.75 & 4.5\\4.5 & 3\end{matrix}\right]$$

Time t = 4

$\bar{\mathbf{\Sigma}_4}$:

In [6]:
Sigma_4 = A * Sigma_3 * A.T + R
Sigma_4
Out[6]:
$$\left[\begin{matrix}21.0 & 8.0\\8.0 & 4\end{matrix}\right]$$

Time t = 5

$\bar{\mathbf{\Sigma}_5}$:

In [7]:
Sigma_5 = A * Sigma_4 * A.T + R
Sigma_5
Out[7]:
$$\left[\begin{matrix}41.25 & 12.5\\12.5 & 5\end{matrix}\right]$$

5. Digression: What's This All Mean?

These covariance matrices are good and all, but what does this all mean? How do these matrices help us to understand the robot and its environment?

Also, how do we know if they are correct? Equation (12) is reasonably intuitive but equation (13) for the covariance matrix? Not so much. The proof of equations (12) and (13) extends over five pages in Probabilistic Robotics and requires calculus and something called an Inversion Lemma. (I hope to work through it eventually, but not yet.)

Monte Carlo Simulation

An easier way to visualize what's happening and convince ourselves that these results are correct is to run a Monte Carlo numeric simulation. Monte Carlo simulations use pseudo random number generators to replicate scenarios. Running the simulation many times and recording the results helps us to understand the range of possible outcomes, as well as the likelihood of specific outcomes.

The first part of the Monte Carlo simulation is listed below. The Python function below, rnd_move(), calculates velocity and position for a specified number of time intervals, using a randomly generated value for acceleration.

In [8]:
import pandas as pd
import scipy.stats as stats

def rnd_move(time, return_df=False):
    """Caculates the robot's position and velocity at time t.
    
    This function assumes the robot starts at position 0, with velocity 0.
    The function randomly generates an acceleration for each step that is
    normally distributed with mean 0 and variance = 1.
    
    Args:
        time: a positive integer representing the time t.
        return_df: if True, returns results as a dataframe. Optional, default
            is False.
            
    Returns:
        If return_df is True, returns a Pandas Dataframe, where each row of
        the dataframe corresponds to one time interval, with the first row
        containing the position, velocity, and acceleration at time t = 0,
        the second row showing time t = 1, etc.
        
        if return_df = False, rnd_move() returns a tuple containint the
        acceleration, velocity, and position.            
    """
    # Randomly generated accelerations from a normal distribution.
    accelerations = list(stats.norm.rvs(size=time))
    
    # Initialize variables
    velocities = [0]
    positions = [0]
    vel_1 = 0
    vel = 0
    pos_1 = 0
    pos = 0
    
    # Calculate velocities and positions
    for acc in accelerations:
        vel = acc + vel_1
        pos = pos_1 + vel_1 + acc/2
        velocities.append(vel)
        positions.append(pos)
        vel_1 = vel
        pos_1 = pos
    accelerations.insert(0, 0)
    
    # Return results
    if return_df:
        return pd.DataFrame({"Acceleration": accelerations,
                             "Velocity": velocities,
                             "Position": positions})
    else:
        return accelerations[-1], velocities[-1], positions[-1]

Let's verify that rnd_move() works.

In [9]:
df = rnd_move(5, return_df=True)
df
Out[9]:
Acceleration Position Velocity
0 0.000000 0.000000 0.000000
1 -0.127895 -0.063947 -0.127895
2 1.161590 0.388953 1.033695
3 -0.314031 1.265633 0.719664
4 0.326284 2.148439 1.045948
5 -1.707905 2.340435 -0.661956
In [10]:
# The Python assert statement throws an error if its argument is not True.
# So if this code does not generate any errors, then rnd_move() is working.

# Check that velocity = acceleration + velocity from the previous line
for row in range(1, 6):
    assert(df.Velocity[row] == df.Acceleration[row] + df.Velocity[row-1])
print("Velocities verified correct")

# Check that position = position + velocity from previous line +
#     acceleration / 2
for row in range(1, 6):
    assert(df.Position[row] == df.Position[row-1] + df.Velocity[row-1] +
           df.Acceleration[row]/2)
print("Positions verified correct")
Velocities verified correct
Positions verified correct

Now we need to write a function that will run rnd_move numerous times (each time is called a replication) and store the results.

In [11]:
def rnd_sim(time, n=100):
    """Runs the numerical simulation multiple times.
    
    Args:
        time: a positive integer representing the time t.
        n: a positive integer specifying the number of replications.
            
    Returns:
        A Pandas Dataframe containing the final velocity and position for
        each simulation.
    """
    velocities=[]
    positions=[]
    
    for _ in range(n):            
        _, vel, pos = rnd_move(time)
        velocities.append(vel)
        positions.append(pos)
        
    return pd.DataFrame({"Velocity": velocities, "Position": positions})

Simulation at t = 1

Now let's run the simulation 100 times for time $t = 1$ and look at the results.

In [12]:
# Run the simulation 100 times.
sim1_data = rnd_sim(1)

# Check the first few rows of data to make sure it looks right.
sim1_data.head()
Out[12]:
Position Velocity
0 0.191649 0.383298
1 1.146622 2.293244
2 0.344723 0.689446
3 -0.501295 -1.002589
4 0.114651 0.229302

At first glance, the table below looks very similar to the one above, with columns for position and velocity. But each row represents a position and velocity after one time interval $\Delta T$.

The table looks right. Let's plot it:

In [13]:
import matplotlib
matplotlib.use('nbAgg')
import matplotlib.pyplot as plt

def plot_sim(data, time):
    plt.clf()
    plt.plot(data.Position, data.Velocity, 'b.')
    title = ("Robot Position at Time t = {0}\n"
             " {1} Replications".format(time, data.shape[0]))
    plt.title(title)
    plt.xlabel("Position")
    plt.ylabel("Velocity")
    plt.show()
    
plot_sim(sim1_data, 1)

Well, that's interesting.

  1. As expected, velocity and position are both centered around 0.
  2. Velocity extends over a wider range than position. Velocity extends from about -2 to 2 while position only extends from about -1 to 1. This is expected because the variance of velocity is larger than the variance for position - 1.0 verses 0.25 respectively at time $t = 1$.
  3. As expected for normally distributed data, there are fewer results at the ends of the line than in the middle.

But what's most interesting to me (even though it's not unexpected) is that all values plotted on a straight line. After a single time interval, there is only one possible position for every velocity and vice-versa. This situation is reflected in the covariance matrix $\mathbf{\Sigma}_1$. The non-diagonal covariance elements of $\mathbf{\Sigma}_1$ (the elements at positions 0,1 and 1,0) correspond to a correlation of 1 (see equation 8).

Degenerate Probability Distributions

The probability distribution for time $t = 1$ is a degenerate probability distribution. Even though it's output is a probability over a two dimensional space, the input to the distribution is only one dimensional. Higher-dimension probability distributions can be degenerate as well: a 3-dimensional distribution with only 1 or 2-dimensional inputs would also be degenerate. One way to determine that a probability distribution is degenerate is to check the determinant of the covariance matrix:

\begin{align} \tag{14} det(\mathbf{\Sigma}_1) & = det \left\lgroup \matrix{1 & \frac{1}{2} \cr \frac{1}{2} & \frac{1}{4}} \right\rgroup \\ & = 1 \times \frac{1}{4} - \frac{1}{2} \times \frac{1}{2} \\ & = 0 \end{align}

If the determinant of the covariance matrix is zero, then the probability distribution will be degenerate.

In addition, when the determinant is zero, the matrix cannot be inverted. This matters because the multivariate (i.e., multi-dimensional) normal probability distribution requires inversion of the covariance matrix $\mathbf{\Sigma}$ (matrix inversion is indicated by the superscript -1):

$$ \tag{15} p(\vec x) = det(2\pi\mathbf{\Sigma}) exp \left(-\frac{1}{2}(\vec x - \vec \mu)^T \mathbf{\Sigma}^{-1}(\vec x - \vec \mu)\right)$$

So equation (15) cannot be the probability distribution for $\vec x_1$. Instead:

$$ \tag{16} p(\vec x_1) = \left\lgroup \begin{matrix}\frac{1}{\sqrt{2\pi}} exp \left( \frac{-{\dot x_1}^2}{2} \right) &: x = \frac{\dot x}{2} \cr 0 &: x \neq \frac{\dot x}{2} \end{matrix} \right\rgroup $$

Simulation at time 2

Now let's plot the numerical simulation results for $t = 2$

In [14]:
# Run the simulation 100 times with two steps.
sim2_data = rnd_sim(2)
plot_sim(sim2_data, 2)

OK, now we're getting somewhere. The results are no longer perfectly correlated. We can also show that the determinant of the covariance matrix at time $t = 2$, $\mathbf{\Sigma}_2$, is no longer 0. Due to laziness, I will have Python figure out the determinant.

In [15]:
# Coveriance matrix at time t = 2, calculated using Kalman algorithm
Sigma_2
Out[15]:
$$\left[\begin{matrix}2.5 & 2.0\\2.0 & 2\end{matrix}\right]$$
In [16]:
# Determinant of covariance matrix does not equal 0
Sigma_2.det()
Out[16]:
$$1.0$$

Now we'll verify that the covariance matrix for the numerical simulation matches the calculated covariance matrix. This should convince us that the covariance matrix calculated by the Kalman algorithm represents reality,

In [17]:
# See if the covariance matrix for simulation matches calculated matrix
# Running 1000 replicaitons for greater accuracy
rnd_sim(2, 1000).cov()
Out[17]:
Position Velocity
Position 2.532067 2.004440
Velocity 2.004440 1.995906

At time $t = 2$ and later, the probability distribution is no longer degenerate, which means we can state the probability that $\vec x_t$ assumes a specific value using the multi-variate form of the normal probability distribution:

$$ \tag{17} p(\vec x_t) = \textrm{det}(2\pi\mathbf{\Sigma}_t) = \textrm{exp} \left(-\frac{1}{2} {(\vec {x}_t - \vec \mu)}^T {\mathbf{\Sigma}_t}^{-1} (\vec {x}_t - \vec \mu) \right)$$

Since $\vec \mu = \left\lgroup \begin{matrix} 0 \\ 0 \end{matrix} \right\rgroup$, equation simplifies to:

$$ \tag{17} p(\vec x_t) = \textrm{det}(2\pi\mathbf{\Sigma}_t) = \textrm{exp} \left(-\frac{1}{2} {\vec {x}_t}^T {\mathbf{\Sigma}_t}^{-1} \vec {x}_t \right)$$

6. Uncertainty (Error) Ellipse

Exercise 3-1(d)

(d) For each value of $t$, plot the joint posterior over $x$ and $\dot x$ in a diagram where $x$ is the horizontal and $\dot x$ is the vertical axis. For each posterior, you are asked to plot an uncertainty ellipse, which is the ellipse of points that are one standard deviation away from the mean. Hint: if you do not have access to a mathematics library, you can create those ellipses by analyzing the eigenvalues and eigenvectors of the covariance matrix.


Solution

It's still difficult to visualize what the covariance matrix means. Fortunately, there is a technique for visualizing a two-dimensional covariance matrix.

We can create an ellipse from the covariance matrix by calculating the eigenvalues and eigenvectors of the matrix. I will not try to explain eigenvalues and eigenvectors in this document (see http://www.visiondummy.com/2014/03/eigenvalues-eigenvectors/), but they can be calculated easily enough with Python:

In [18]:
eigen2 = Sigma_2.eigenvects()
eigen2
Out[18]:
$$\left [ \left ( 0.234435562925363, \quad 1, \quad \left [ \left[\begin{matrix}-0.882782218537319\\1.0\end{matrix}\right]\right ]\right ), \quad \left ( 4.26556443707464, \quad 1, \quad \left [ \left[\begin{matrix}1.13278221853732\\1.0\end{matrix}\right]\right ]\right )\right ]$$

We have one eigenvalue of 0.234 and a corresponding eigenvector of (-0.883 and 1.0), and another eigenvalue of 4.265 and corresponding eigenvector of (1.133, 1.0).

The smaller eigenvalue is the square of the length of the semi-minor axis of the ellipse, and the larger eigenvector is the square of the length of the semi-major axis. The two vectors represent the directions of the semi-minor and semi-major axes. Let's plot this ellipse on the same plot as the data from the Monte Carlo simulation. First, we have to identify the angle of the major axis against the X-axis. While we're at it, we'll write a function to extract the eigenvalues and eigenvectors from the Sympy matrix.

In [19]:
import math

def get_eigen_data(eigenvects):
    if eigenvects[0][0] > eigenvects[1][0]:
        idx_max = 0
        idx_min = 1
    else:
        idx_max = 1
        idx_min = 0
        
    return {"eigvec_major": list(eigenvects[idx_max][2][0]),
            "eigvec_minor": list(eigenvects[idx_min][2][0]),
            "eigval_major": eigenvects[idx_max][0],
            "eigval_minor": eigenvects[idx_min][0],
            "theta_major": math.atan(eigenvects[idx_max][2][0][1]/
                               eigenvects[idx_max][2][0][0])*180/math.pi,
            "theta_minor": math.atan(eigenvects[idx_min][2][0][1]/
                               eigenvects[idx_min][2][0][0])*180/math.pi,
           }


ed2 = get_eigen_data(eigen2)
ed2
Out[19]:
{'eigval_major': 4.26556443707464,
 'eigval_minor': 0.234435562925363,
 'eigvec_major': [1.13278221853732, 1.00000000000000],
 'eigvec_minor': [-0.882782218537319, 1.00000000000000],
 'theta_major': 41.43749182554911,
 'theta_minor': -48.562508174450905}

Now that we have a data structure that's more intuitively labeled, we'll plot the corresponding ellipse along with the results of our numerical simulation. We'll center the ellipse at the average robot state: position = 0 and velocity = 0.

In [20]:
import matplotlib.patches as mpatches
import matplotlib.lines as lines
import numpy as np

# Setup plot
plt.close()
fig2 = plt.figure()
ax = fig2.add_subplot(111)

# Set limits of X and Y axes
lim = 6
ax.set_xlim(-lim, lim)
ax.set_ylim(-lim, lim)
ax.set_aspect('equal', adjustable='datalim')

# Add ellipse
ell2 = mpatches.Ellipse((0,0), math.sqrt(ed2["eigval_major"])*2,
                        math.sqrt(ed2["eigval_minor"])*2, ed2["theta_major"],
                        fill=False, edgecolor="red")
ax.add_patch(ell2)

# Add lines showing major and minor axes
ax_major_hor = [0 - ed2["eigvec_major"][0]*4, 0 + ed2["eigvec_major"][0]*4]
ax_major_ver = [0 - ed2["eigvec_major"][1]*4, 0 + ed2["eigvec_major"][1]*4]
ax_minor_hor = [0 - ed2["eigvec_minor"][0]*2, 0 + ed2["eigvec_minor"][0]*2]
ax_minor_ver = [0 - ed2["eigvec_minor"][1]*2, 0 + ed2["eigvec_minor"][1]*2]
plt.plot(ax_major_hor, ax_major_ver, color="grey")
plt.plot(ax_minor_hor, ax_minor_ver, color="silver")
ax.annotate("Major Axis",
            xy=(ed2["eigvec_major"][0]*3, ed2["eigvec_major"][1]*3),
            xytext=(0.6, 0.9), textcoords="axes fraction",
            xycoords="data",
            arrowprops=dict(facecolor='grey', shrink=0.05)
           )
ax.annotate("Minor Axis",
            xy=(ed2["eigvec_minor"][0], ed2["eigvec_minor"][1]),
            xytext=(0.15, 0.6), textcoords="axes fraction",
            xycoords="data",
            arrowprops=dict(facecolor='silver', shrink=0.05)
           )

# Add results of Monte Carlo simulation
ax.scatter(np.array(sim2_data.Position),
           np.array(sim2_data.Velocity), marker=".")

# Add titles and Labels
title = ("Robot Position at Time t = 2, 100 Replications\n"
         "covariance ellipse of 1 standard deviation")
plt.title(title)
plt.xlabel("Position")
plt.ylabel("Velocity")

fig2.show()

The uncertainty ellipse is approximately the same shape as our data, but only about a third of the data points are plotting within the ellipse. There is a reason for this - the eigenvalues are the variances in the direction of the ellipse axes. The ellipse axes are labeled and they are not the same as the horizontal and vertical plot axes (which represent position and velocity). Since we used the square root of the eigenvalues when we plotted the ellipse, the length of the ellipse covers data that is within one standard deviation of the mean along the major axes, and the width covers data that is within one standard deviation along the minor axis. Data points that fall within the ellipse are within one standard deviation on both the minor and major axes.

The ellipse can be scaled to contain more of the data points. To get approximately 95% of the data points within the ellipse, we multiply the eigenvalues by 5.992 before we take the square root. (The reason for this is that the position of data points with respect to the ellipse follows a Chi-Square probability distribution and the 5.992 value comes from that distribution - I am not going to try to explain that here. Please just push your "I believe" button for now.) For a 90% ellipse use 4.605, and use 9.210 for a 99% ellipse. Let's try a 95% ellipse:

In [21]:
def plot_sim2(sim_data, time, cov_mtx):
    plt.close()
    
    # Plot data from Monte Carlo simulation
    plt.plot(sim_data.Position, sim_data.Velocity, 'b.')
    
    # Set plot size
    ax = plt.gca()
    hor_cov = float(cov_mtx.row(0).col(0)[0])
    ver_cov = float(cov_mtx.row(1).col(1)[0])
    scale = 3
    ax.set_xlim(-hor_cov*scale, hor_cov*scale)
    ax.set_ylim(-ver_cov*scale, ver_cov*scale)
    ax.set_aspect('equal', adjustable='datalim')

    # Add ellipse
    eig = get_eigen_data(cov_mtx.eigenvects())
    ell2 = mpatches.Ellipse((0,0), math.sqrt(eig["eigval_major"]*5.992)*2,
                            math.sqrt(eig["eigval_minor"]*5.992)*2,
                            eig["theta_major"], fill=False, edgecolor="red")
    ax.add_patch(ell2)
    
    # Add labels and title
    title = ("Robot Position at Time t = {0}, "
             "{1} Replications\n"
             "95% Covariance Ellipse".format(time, sim_data.shape[0]))
    plt.title(title)
    plt.xlabel("Position")
    plt.ylabel("Velocity")
    plt.show()
    
plot_sim2(sim2_data, 2, Sigma_2)

Yay!!! Math works!!! We expect on average, that only 5 data points will be outside the 95% ellipse, but that won't happen every time.

Note: many Internet sources use the term error ellipse, which can be drawn at any desired confidence level (e.g., 90%, 95%, 99%). The only reference I have seen to the term uncertainty ellipse is in Probabilistic Robotics.

We can also compare the analytical covariance matrix to the corresponding matrix from the Monte Carlo simulation. Here is the analytical matrix:

In [22]:
Sigma_2
Out[22]:
$$\left[\begin{matrix}2.5 & 2.0\\2.0 & 2\end{matrix}\right]$$

And here is the corresponding matrix from the Monte Carlo simulation (using 1000 replications for greater precision):

In [23]:
rnd_sim(2, 1000).cov()
Out[23]:
Position Velocity
Position 2.400100 1.981144
Velocity 1.981144 2.049578

I think they match up pretty well.

Error Ellipse at t = 1

For the remainder of the notebook I'll plot 95% error ellipses instead of the uncertainty ellipse that was requested in the exercise. The only difference is the factor of $\sqrt{5.992}$, and I think the 95% error ellipse shows the relation between the analytical and numerical results more clearly.

Let's step back to time $t = 1$ and explore how we would graphically represent the 95% interval:

In [24]:
plot_sim2(sim1_data, 1, Sigma_1)

The same code that plotted the covariance ellipse in other graphs plotted this graph. For $t = 1$, the covariance matrix $\mathbf{\Sigma}_1$ is degenerate and has a determinant of 0. Consequently the eigenvalue that would have correlated to the semi-minor axis of the ellipse is 0. Plotting an ellipse with one axis of length 0 results in plotting a straight line. Here are the eigenvalues and eigenvectors for $t = 1$:

I've plotted this chart several times. There has never been a data point that did not fit on the line, which shouldn't happen because it's supposed to be a 95% interval. I suspect that I should be multiplying the the square root of the eigenvalue by 3.843 instead of 5.992. The Chi-Squared probability distribution depends on degrees of freedom. In the case of an ellipse, there are two degrees of freedom and the critical value of the Chi-Squared distribution corresponding to 0.05 is 5.992. At time $t = 1$ where all points plot on a line, there is only one degree of freedom and the corresponding critical value is 3.843. I'll leave verification of this hypothesis as an exercise for the reader.

Note: I'm getting these critical values from an appendix to my statistics text from my course at Naval Postgraduate School, Probability and Statistics for Engineering and the Sciences, Jay L. Devore.

In [25]:
ed1 = get_eigen_data(Sigma_1.eigenvects())
ed1
Out[25]:
{'eigval_major': 1.25000000000000,
 'eigval_minor': 0,
 'eigvec_major': [0.500000000000000, 1.00000000000000],
 'eigvec_minor': [-2.00000000000000, 1.00000000000000],
 'theta_major': 63.43494882292201,
 'theta_minor': -26.56505117707799}

We'll also compare the analytical and numeric covariance matrices for time $t = 2$. Here is the analytical matrix:

In [26]:
Sigma_1
Out[26]:
$$\left[\begin{matrix}0.25 & 0.5\\0.5 & 1\end{matrix}\right]$$

And here is the numeric matrix:

In [27]:
rnd_sim(1, 1000).cov()
Out[27]:
Position Velocity
Position 0.251812 0.503625
Velocity 0.503625 1.007249

t = 3

First, calculate $\mathbf{\Sigma}_3$

Now run the Monte Carlo simulation for time $t = 3$ and plot the results:

In [28]:
sim3_data = rnd_sim(3, 100)
plot_sim2(sim3_data, 3, Sigma_3)

Comparison of covariance matrices:

In [29]:
rnd_sim(3, 1000).cov()
Out[29]:
Position Velocity
Position 8.500348 4.390174
Velocity 4.390174 2.890584
In [30]:
Sigma_3
Out[30]:
$$\left[\begin{matrix}8.75 & 4.5\\4.5 & 3\end{matrix}\right]$$

t = 4

$ \mathbf{\Sigma}_4$

In [31]:
sim4_data = rnd_sim(4, 100)
plot_sim2(sim4_data, 4, Sigma_4)
In [32]:
rnd_sim(4, 1000).cov()
Out[32]:
Position Velocity
Position 20.733768 7.893074
Velocity 7.893074 3.907610
In [33]:
Sigma_4
Out[33]:
$$\left[\begin{matrix}21.0 & 8.0\\8.0 & 4\end{matrix}\right]$$

t = 5

$ \mathbf{\Sigma}_5$

In [34]:
sim5_data = rnd_sim(5, 100)
plot_sim2(sim5_data, 5, Sigma_5)
In [35]:
rnd_sim(5, 10000).cov()
Out[35]:
Position Velocity
Position 41.232678 12.489124
Velocity 12.489124 5.024705
In [36]:
Sigma_5
Out[36]:
$$\left[\begin{matrix}41.25 & 12.5\\12.5 & 5\end{matrix}\right]$$

7. Correlation at Infinity

Exercise 3-1(e)

What will happen to the correlation between $x_t$ and $\dot x_t$ as $t \rightarrow \infty$?


Solution

We'll write a couple Python functions to calculate correlation at various values of time $t$.

In [37]:
def correlation(sigma):
    cov = sigma.row(0).col(1)[0]
    sig_pos = math.sqrt(sigma.row(0).col(0)[0])
    sig_vel = math.sqrt(sigma.row(1).col(1)[0])
    return cov / (sig_pos * sig_vel)

# Check that function works
correlation(Sigma_5)
Out[37]:
$$0.870388279778489$$
In [38]:
def cov_mat(t):
    if t == 1:
        return R
    sig = R
    for idx in range(1, t):
        sig_t = A * sig * A.T + R
        sig = sig_t
    return sig_t

{n: correlation(cov_mat(n)) for n in [1, 5, 10, 50, 100,
                                      500, 1000, 5000, 10000]}
Out[38]:
$$\left \{ 1 : 1.0, \quad 5 : 0.870388279778489, \quad 10 : 0.86710996952412, \quad 50 : 0.866068708302494, \quad 100 : 0.866036229304965, \quad 500 : 0.866025836797465, \quad 1000 : 0.866025512037634, \quad 5000 : 0.866025408114566, \quad 10000 : 0.86602540486697\right \}$$

Correlation appears to converge to a constant 0.866025 as $t \rightarrow \infty$. I wish I had a clue on how to solve for this value analytically.

Stacy Irwin, 29 July 2018