Extended Kalman Filter

Exercise 3-4 from Probabilistic Robotics

This notebook addresses the extended Kalman filter from exercise 3-4 (page 83) in Probabilistic Robotics by Thrun, Bergard, and Fox.

In [1]:
# Setup notebook for plotting and calculations

# Plotting
import matplotlib
matplotlib.use('nbAgg')
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches
# import matplotlib.lines as lines

# Numerical simulations
import math
import numpy as np
import scipy.stats as stats

# Matrix calculations
import sympy
from IPython.display import display
sympy.init_printing(use_latex=True)

1. Scenario Description

Exercise 3-4

  1. We noted in the text that the EKF linearization is an approximation. To see how bad this approximation is, we ask you to work out an example. Suppose we have a mobile robot operating in a planar environment. Its state is its x-y-location and its global heading direction $\theta$. Suppose we know $x$ and $y$ with high certainty, but the orientation $\theta$ is unknown. This is reflected by our initial estimate

$$ \tag{1} \vec \mu = \lgroup \begin{matrix}0 & 0 & 0 \end{matrix} \rgroup $$

$$ \tag{2} \mathbf{\Sigma} = \left\lgroup \begin{matrix}0.01 & 0 & 0 \cr 0 & 0.01 & 0 \cr 0 & 0 & 10000 \end{matrix} \right\rgroup $$


Discussion

The covariance matrix $\Sigma$ indicates we know $x$ and $y$ with a high degree of precision (variance $ = 0.01$), but we have little knowledge regarding $\theta$. I do wish the authors had specified units for $\theta$. If they are using radians, then a single standard deviation ($\sigma = 100$) extends through many full rotations of the angle. Consequently we might as well think of the probability distribution of $\theta$ as uniform, extending from $0$ to $2\pi$. On the other hand, if the units are in degrees, then a single standard deviation extends slightly beyond the vertical y-axis and there is a higher probability that the robot is pointed to the right (positive x direction). Note that an entire circle fits within two standard deviations (40 degrees) and there is a significant probability of observing values beyond a single standard deviation from the mean. This means that we cannot ignore the possibility that the robot is pointed to the left.

2. Intuitive Solution

Exercise 3-4(a)

(a) Draw, graphically, your best model of the posterior over the robot pose after the robot moves $d = 1$ units forward. For this exercise, we assume the robot moves flawlessly without any noise. Thus, the expected location of the robot after motion will be $$ \tag{3} \left\lgroup \begin{matrix} x^\prime \cr y^\prime \cr \theta^\prime \end{matrix} \right\rgroup = \left\lgroup \begin{matrix} x + \cos \theta \cr y + \sin \theta \cr \theta \end{matrix} \right\rgroup $$ For your drawing, you can ignore $\theta$ and only draw the posterior in x-y-coordinates.


Solution

If we know the robot's initial x-y-position is at $(0, 0)$ and that it moves forward one unit, but we don't know the heading, then after the movement the robot is somewhere on a circle centered at $(0, 0)$ with radius = 1.

In [2]:
# Setup plot
plt.close()
fig = plt.figure()
ax = fig.add_subplot(111)

scale = 1.5
ax.set_xlim(-scale, scale)
ax.set_ylim(-scale, scale)
ax.set_aspect('equal', adjustable='datalim')
plt.grid()

circ = mpatches.Circle((0, 0), 1, fill=False, color="Purple",
                       linewidth=3)
ax.add_patch(circ)

title = "Robot Position at Time t = 1\nIntuitive Solution"
plt.title(title)
plt.xlabel("x")
plt.ylabel("y")

plt.show()

Numerical Simulation

Let's use a numerical simulation to see how accurate the intuitive solution is.

In [3]:
def sim_move(n=100, degrees=False):
    x0 = stats.norm.rvs(size=n, scale=0.1)
    y0 = stats.norm.rvs(size=n, scale=0.1)
    th0 = stats.norm.rvs(size=n, scale=100)    
    
    if degrees:
        units = 180/math.pi
    else:
        units = 1
    
    x1 = []
    y1 = []
    th1 = []
    for idx in range(n):
        # math.cos uses radians
        x1.append(x0[idx] + math.cos(th0[idx]*units))
        y1.append(y0[idx] + math.sin(th0[idx]*units))
        th1 = th0
       
    return {"x0": x0, "y0": y0, "th0": th0, "x1": x1, "y1": y1, "th1": th1}
In [4]:
data_rad = sim_move()
data_deg = sim_move(degrees=True)
plt.close()

def plot_step(data, subtitle):
    plt.plot(data["x0"], data["y0"], "b.")
    plt.plot(data["x1"], data["y1"], "r.")
    ax = plt.gca()
    scale = 2
    ax.set_xlim(-scale, scale)
    ax.set_ylim(-scale, scale)
    ax.set_aspect('equal', adjustable='datalim')
    plt.grid()
    plt.title("Robot Position at Times 1 and 2\n" + subtitle)
    plt.xlabel("x")
    plt.ylabel("y")
    
plt.subplot(121)
plot_step(data_rad, "Radians")
plt.subplot(122)
plot_step(data_deg, "Degrees")
plt.show()

The plots show the results of 100 random simulations of the state transition between times 1 and 2. The robot position at time $t = 0$ is in blue, and the position at time $t = 1$ is in red. There is no readily apparent difference between using radians or degrees for the variance of $\theta$.

3. Prediction Step

Exercise 3-4(b)

(b) Now develop this motion into a prediction step for the EKF. For that you have to define a state transition function and linearize it. You then have to generate a new Gaussian estimate of the robot pose using the linearized model. You should give the exact mathematical equations for each of these steps, and state the Gaussian that results.


Solution

Determining $g(u_t, \vec \mu_{t-1})$

The EKF algorithm has a structure that is very similar to the standard Kalman filter. The first step is the state transition: $$ \tag{4} \overline{\vec \mu}_t = g(u_t, \vec \mu_{t-1})$$

The function $g(u_t, \vec \mu_{t-1})$ replaces the linear matrix equation used in the standard Kalman filter algorithm. From equation (3) we can see that:

$$ \tag{5} \vec \mu = \left\lgroup \begin{matrix} x \cr y \cr \theta \end{matrix} \right\rgroup $$

$$ \tag{6} u_t = d $$

$$ \tag{7} g(u_t, \vec \mu_{t-1}) = \left\lgroup \begin{matrix} x_{t-1} + d \, cos \, \theta_{t-1} \cr y + d \, sin \, \theta_{t-1} \cr \theta_{t-1} \end{matrix} \right\rgroup $$

State Transition to $\overline{\vec \mu}$

From equation(1): $$ \tag{8} \vec \mu_0 = \left\lgroup \begin{matrix}0 \cr 0 \cr 0 \end{matrix} \right\rgroup $$

Applying equation (7) with $ d = 1$:

$$ \tag{9} \overline{\vec \mu_1} = \left\lgroup \begin{matrix}1 \cr 0 \cr 0 \end{matrix} \right\rgroup $$

Jacobian Matrix $G$

The state transition function $g()$ is nonlinear due to the trigonometric functions. The extended Kalman filter addresses nonlinearity by approximating $g()$ with a linear function based on Taylor series. The first step in developing the linear approximation is to calculate the Jacobian matrix $\mathbf G$. The Jacobian is a matrix of all of the first-order partial derivatives of $g()$ (see https://en.wikipedia.org/wiki/Jacobian_matrix_and_determinant):

$$ \tag{10} \mathbf G = \left\lgroup \begin{matrix} \frac{\partial g_x}{\partial x} & \frac{\partial g_x}{\partial y} & \frac{\partial g_x}{\partial \theta} \cr \frac{\partial g_y}{\partial x} & \frac{\partial g_y}{\partial y} & \frac{\partial g_y}{\partial \theta} \cr \frac{\partial g_\theta}{\partial x} & \frac{\partial g_\theta}{\partial y} & \frac{\partial g_\theta}{\partial \theta} \end{matrix} \right\rgroup $$

After calculating the first order derivatives and setting $d = 1$:

$$ \tag{11} \mathbf G = \left\lgroup \begin{matrix} 1 & 0 & -sin \, \theta_{t-1} \cr 0 & 1 & cos \, \theta_{t-1} \cr 0 & 0 & 1 \end{matrix} \right\rgroup $$

We should get the best approximation by calculating the value of the Jacobian at $\overline{\vec \mu}$, the state considered most likely. In this case, we set $\theta = 0$.

$$ \tag{12} \mathbf G = \left\lgroup \begin{matrix} 1 & 0 & 0 \cr 0 & 1 & 1 \cr 0 & 0 & 1 \end{matrix} \right\rgroup $$

Predicted Covariance $\mathbf{\overline{\Sigma}}_1$

The equation for the predicted covariance is similar to the standard Kalman filter: $$ \tag{13} \overline{\mathbf{\Sigma}}_t = \mathbf G_t \mathbf \Sigma_{t-1} \mathbf G_t^T + \mathbf{R}_t $$

For this exercise we are assuming that there is no noise with respect to the robot motion. Therefore we can ignore the $\mathbf{R}_t$ matrix and equation (13) will simplify to $\overline{\mathbf{\Sigma}}_t = \mathbf G_t \mathbf \Sigma_{t-1} \mathbf G_t^T$.

In [5]:
# Define matrices
mu1_pred = sympy.Matrix([1, 0, 0])
sigma0 = sympy.Matrix([[0.01, 0, 0], [0, 0.01, 0], [0, 0, 10000]])
G1 = sympy.Matrix([[1, 0, 0], [0, 1, 1], [0, 0, 1]])

# Calculate predicted covariance
sigma1_pred = G1 * sigma0 * G1.T
sigma1_pred
Out[5]:
$$\left[\begin{matrix}0.01 & 0 & 0\\0 & 10000.01 & 10000\\0 & 10000 & 10000\end{matrix}\right]$$

Calculating the prediction step is complete now that we have the predicted mean $\overline{\vec \mu_1}$ and the predicted covariance $\overline{\mathbf{\Sigma}_1}$. Defining the Gaussian probability distribution for the prediction is straight-forward: $$ \tag{14} p(\vec x) = det(2\pi\overline{\mathbf{\Sigma}_1}) exp \left(-\frac{1}{2}(\vec x - \overline{\vec \mu_1})^T \overline{\mathbf{\Sigma}_1}^{-1}(\vec x - \overline{\vec \mu_1})\right)$$

4. Uncertainty Ellipse

Exercise 3-4(c)

(c) Draw the uncertainty ellipse of the Gaussian and compare it with your intuitive solution.


Solution

Trim to Two Dimensions

First we need to trim the mean and covariance matrix down to two dimensions by eliminating the elements related to $\theta$.

In [6]:
mu1_xy_pred = mu1_pred.copy()   # Copy() required because 
mu1_xy_pred.row_del(2)          #   row_del alters matrix in place

sigma1_xy_pred = sigma1_pred.copy()
sigma1_xy_pred.col_del(2)
sigma1_xy_pred.row_del(2)

print("mu:")
display(mu1_xy_pred)
print("Sigma:")
display(sigma1_xy_pred)
mu:
$$\left[\begin{matrix}1\\0\end{matrix}\right]$$
Sigma:
$$\left[\begin{matrix}0.01 & 0\\0 & 10000.01\end{matrix}\right]$$

Plot 95% Uncertainty Ellipse

In [7]:
def theta(vector):
    """Calculates the angle between a vector and x-axis.
    
    Args:
        vector: ([numeric]) Length two list of numeric values.
        
    Returns: (float) Angle between vector and x-axis.
    """
    if vector[0] == 0:
        if vector[1] > 0:
            return 90
        elif vector[1] < 0:
            return -90
        else:
            return None
    else:
        return math.atan(vector[1]/vector[0])*180/math.pi


def eigen_data(matrix):
    """Extracts eigenvalues, eigenvectors and angles from covariance matrix.
    """
    eigenvects = matrix.eigenvects()
    if eigenvects[0][0] > eigenvects[1][0]:
        idx_max = 0
        idx_min = 1
    else:
        idx_max = 1
        idx_min = 0
        
    return {"eigvec_major": list(eigenvects[idx_max][2][0]),
            "eigvec_minor": list(eigenvects[idx_min][2][0]),
            "eigval_major": eigenvects[idx_max][0],
            "eigval_minor": eigenvects[idx_min][0],
            "theta_major": theta(eigenvects[idx_max][2][0]),
            "theta_minor": theta(eigenvects[idx_min][2][0])}


def plot_ellipses(beliefs, colors, title, xscale=10, yscale=10):
    """Plots a series of uncertainty ellipses.
    
    Args:
        beliefs: [{"mu": mu, "Sigma": sigma}, ...]
    """
    plt.close()
    
    # Setup plot
    fig = plt.figure()
    ax = fig.add_subplot(111)
  
    # Set plot size
    center_x = float(beliefs[0]["mu"][0])
    center_y = float(beliefs[0]["mu"][1])
    ax.set_xlim(-xscale + center_x, xscale + center_x)
    ax.set_ylim(-yscale + center_y, yscale + center_y)
    # ax.set_aspect('equal', adjustable='datalim')
    
    # Plot ellipses
    for belief in enumerate(beliefs):
        col_idx = belief[0]
        mu = belief[1]["mu"]
        eig = eigen_data(belief[1]["Sigma"])
        ell = mpatches.Ellipse((mu[0], mu[1]), 
                               math.sqrt(eig["eigval_major"]*5.992)*2,
                               math.sqrt(eig["eigval_minor"]*5.992)*2,
                               eig["theta_major"], fill=False,
                               edgecolor=colors[col_idx])
        ax.add_patch(ell)
        
    # Plot ellipse centers
    x = [belief["mu"][0] for belief in beliefs]
    y = [belief["mu"][1] for belief in beliefs]
    plt.plot(x, y, ".k-")
        
    # Add title and stuff
    plt.grid()
    plt.title(title)
    plt.xlabel("x")
    plt.ylabel("y")
    plt.show()
In [8]:
title = "Predicted Position (95%) at Time t = 1\nUsing Extended Kalman Filter"
plot_ellipses([{"mu": mu1_xy_pred, "Sigma": sigma1_xy_pred}], "Blue",
             title, xscale=2, yscale=300)

The 95% uncertainty ellipse is plotted (instead of the single standard deviation ellipse). The ellipse bears little resemblence to the intuitive or numerical solutions, indicating that the extended Kalman filter and its linear approximation does not work well in this case.

Effect of Uncertainty in $\theta$

I suspect that the inaccuracy is due to the excessive uncertainty in $\theta$. Let's see what happens when we shrink the variance in $\theta$ down to 1 degree. (This is not unreasonable. A NAVX-MXP inertial measurement card, which is commonly used on FRC robots during the 15 second autonomous phase, is rated at 1 degree of drift per minute).

In [9]:
sigma0_imp = sympy.Matrix([[0.01, 0, 0], [0, 0.01, 0], [0, 0, math.pi/180]])
sigma1_pred_imp = G1 * sigma0_imp * G1.T
sigma1_pred_imp
Out[9]:
$$\left[\begin{matrix}0.01 & 0 & 0\\0 & 0.0274532925199433 & 0.0174532925199433\\0 & 0.0174532925199433 & 0.0174532925199433\end{matrix}\right]$$
In [10]:
sigma1_xy_pred_imp = sigma1_pred_imp.copy()
sigma1_xy_pred_imp.col_del(2)
sigma1_xy_pred_imp.row_del(2)

title = "Predicted Position (95%) at Time t = 1\nImproved Heading Accuracy"
plot_ellipses([{"mu": mu1_xy_pred, "Sigma": sigma1_xy_pred_imp}], "Blue",
             title, xscale=2, yscale=1)

Much better, but the filter still overestimates the uncertainty in the y direction. Also, I'm assuming that the variance in $\theta$ should be expressed in radians. If I leave out the $\pi/180$ conversion, the uncertainty in the y direction is much greater.

5. Measurement Step

Exercise 3-4(d)

(d) Now incorporate a measurement. Our measurement shall be a noisy projection of the x-coordinate of the robot, with covariance $Q = 0.01$. Specify the measurement model. Now apply the measurement both to your intuitive posterior, and formally to the EKF estimate using the standard EKF machinery. Give the exact result of the EKF, and compare it with the result of your intuitive analysis.


Solution

Measurement Function

The measurement function $h(\vec x_t)$ is actually linear.

$$ \tag{14} h(\vec x_t) = x_t = (\begin{matrix} 1 & 0 & 0 \end{matrix}) \left\lgroup \begin{matrix} x_t \cr y_t \cr \theta_t \end{matrix} \right\rgroup = \mathbf{C} \left\lgroup \begin{matrix} x_t \cr y_t \cr \theta_t \end{matrix} \right\rgroup $$

$\mathbf{C} = ( 1 \; 0 \; 0 )$ is the measurement matrix from the standard Kalman filter algorithm. We can go ahead and calculate the linearized Jacobian matrix anyway:

$$ \tag{15} \mathbf{H}_t = \left\lgroup \frac{\partial{x_t}}{\partial{x}} \; \frac{\partial{x_t}}{\partial{y}} \; \frac{\partial{x_t}}{\partial{\theta}} \right\rgroup = (1 \; 0 \; 0) $$

This demonstrates that when the functions $g(\vec x_{t-1}, \vec u_{t-1})$ or $h(\vec x_t)$ are linear, the Jacobian matrices are the same as the matrices used in the standard Kalman filter algorithm.

Measurement Noise

In this example, the covariance matrix for the measurement noise $\mathbf{Q}$ simplifies to a scalar value $\mathbf{Q} = 0.01$.

Kalman Gain

To calculate the Kalman gain $\mathbf{K}$:

$$ \tag{16} \mathbf{K_t} = \mathbf{\overline{\Sigma}}_t \mathbf{{H}}_t^T (\mathbf{H}_t \mathbf{\overline{\Sigma}}_t \mathbf{H}_t^T + \mathbf{Q}_t)^{-1} $$

In [11]:
Q = sympy.Matrix([0.01])
H1 = sympy.Matrix([[1, 0, 0]])
K1 = sigma1_pred * H1.T * (H1 * sigma1_pred * H1.T + Q)**-1
K1
Out[11]:
$$\left[\begin{matrix}0.5\\0\\0\end{matrix}\right]$$

Posterior Mean

In this example, the measurement is a scalar, $z_t$ is a scalar, not a vector.

$$ \tag{17} \vec \mu_t = \overline{\vec \mu}_t + \mathbf{K}_t (z_t - h({\overline{\vec \mu}_t})) $$

In [12]:
def mu1(z, mu_pred):
    return mu1_pred + K1 * (z - mu_pred[0])

x_vals = np.arange(0, 2, .2)
y_vals = [sympy.N(mu1(x, mu1_pred)[0]) for x in x_vals]

plt.close()
plt.plot(x_vals, y_vals, "b")
plt.xlabel("Measurement z")
plt.ylabel("Posterior x position")
plt.title("Posterior Position at Time 1 vs Measurement")
plt.grid()
plt.show()

Posterior Covariance Matrix

$$ \tag{18} \mathbf{\Sigma}_t = (\mathbf{I} - \mathbf{K}_t \mathbf{H}_t) \mathbf{\overline\Sigma}_t $$

In [13]:
sigma1 = (sympy.eye(3) - K1 * H1) * sigma1_pred
print("posterior covariance matrix:")
display(sigma1)
print("predicted covariance matrix:")
display(sigma1_pred)
posterior covariance matrix:
$$\left[\begin{matrix}0.005 & 0 & 0\\0 & 10000.01 & 10000\\0 & 10000 & 10000\end{matrix}\right]$$
predicted covariance matrix:
$$\left[\begin{matrix}0.01 & 0 & 0\\0 & 10000.01 & 10000\\0 & 10000 & 10000\end{matrix}\right]$$

Except for reducing the variance of position $x$, the measurement has very little impact on the posterior covariance matrix. In hindsight, this is not a surprise. Figure 3 shows that almost all of the uncertainty is in the $y$ direction. Measuring the position along the x-axis doesn't do much with respect to collapsing the uncertainty ellipse.

Uncertainty Ellipse

We must calculate the posterior position and 2D covariance matrix.

In [14]:
mu1_z0 = mu1(1, mu1_pred)

sigma1_xy = sigma1.copy()
sigma1_xy.col_del(2)
sigma1_xy.row_del(2)
sigma1_xy
Out[14]:
$$\left[\begin{matrix}0.005 & 0\\0 & 10000.01\end{matrix}\right]$$

Now we'll plat the ellipse.

In [15]:
title = "Posterior Position (95%) at Time t = 1\nmeasurement x = 1"
plot_ellipses([{"mu": mu1_z0, "Sigma": sigma1_xy}], "Blue",
             title, xscale=1, yscale=300)

This is not too surprising. The x measurement did little to reduce our position uncertainty.

Applying EKF to Intuitive Solution

Our intuitive solution indicated that the true position at $t = 1$ could be anywhere on a circle of radius 1 centered at $(0, 0)$. Let's suppose our robot is actually at position $(-1, 0)$, which corresponds to $\theta = 180$. Due to the small variance in the x measurement, it's likely that our measurement will be -1. Let's plot the resulting uncertainty ellipse.

In [16]:
mu1_z180 = mu1(-1, mu1_pred)

title = "Posterior Position (95%) at Time t = 1\nmeasurement x = -1"
plot_ellipses([{"mu": mu1_z180, "Sigma": sigma1_xy}], "Blue",
             title, xscale=1, yscale=300)

Now that's messed up. The EKF is concluding that the robot is at position $(0, 0)$. But according to our intuitive solution we know the robot can't be there. And of course the uncertainty in the y direction is unrealistically large.

6. Discussion

Exercise 3-4(e)

(e) Discuss the difference between your estimate of the posterior, and the Gaussian produced by the EKF. How significant are those differences? What can be changed to make the approximation more accurate? What would have happened if the initial orientation had been known, but not the robot's y-coordinate?


My estimate of the posterior is an annulus centered at $(0, 0)$ with an average radius of 1. This is different than the EKF result of a narrow, vertical ellipse, centered at $(1, 0)$, extending from -250 to +250. This is significant because a number of likely positions per the intuitive solution do not fall within the EKF ellipse. Furthermore, a number of impossible positions do fall within the EKF uncertainty ellipse.

The approximation is more accurate if heading is known with greater accuracy.

If the initial orientation had been known but not the y-coordinate, I would expect the EKF to produce an uncertainty ellipse with a long semi-major axis in the y direction, but a short semi-minor axis in the x direction, centered at the predicted position.