Stochastic Optimal Control - LQG and Kalman Filters

Key Concepts

Extends Linear Quadratic Regulator (LQR) to cases with stochastic (noisy) dynamics and partial state observations.
Two primary types of noise: dynamics noise (process noise) and sensor noise.
Observations are noisy and typically do not provide full state information.
Key Results: Separation Principle and Certainty Equivalence.
- Even though these do not hold in the general nonlinear case, they are fundamental in LQG and widely used.

Discussed reinforcement learning methods that do not rely on the separation principle, using a history of observations instead of a single state vector.
Recent work by Ben Recht (Berkeley) and Sarah Dean (Cornell) showed model-based RL is more data-efficient than model-free RL in the LQG setting.

Optimal Estimation: Using the best estimate of the current state for control using methods like LQR.
Filter Structure: Recursive, linear, MMSE (Minimum Mean Squared Error) estimator.
Estimation Steps: Prediction and Measurement Update.
- Prediction: Uses the system's dynamics to predict the next state and state covariance.
- Measurement Update: Correct the prediction using new measurements by calculating the Kalman gain and updating the state and state covariance.
Kalman Gain: Crucial for adjusting the estimate based on the measurement prediction error.
Innovation & Innovation Covariance: Determine the 'surprise' of measurements and their variability.
Joseph Form: Numerically stable way to update the state covariance to ensure it remains positive semi-definite.

Nonlinear MMSE estimation can be posed as an optimal control problem using a cost function that includes measurement errors and process noise.
Insights: Similarities with LQR in terms of solving the control problem to achieve optimal estimation.
Allows generalization to more complex, nonlinear systems using approaches like the Extended Kalman Filter (EKF), Unscented Kalman Filter (UKF), and particle filters.

Numerical Stability: Use Joseph Form and consider implementing square-root filters for better numerical properties.
Outlier Rejection: Use innovation and its covariance to discard unlikely measurements.
Nonlinear Extensions: Example approaches include EKF, UKF, iterated EKF, and particle filters for more complex, nonlinear systems.

Practical demonstrations using a simple double integrator system and LQR control.
Highlighted the importance of tuning and how Kalman filters can remain consistent despite noisy measurements.