Lecture on Linear Quadratic Regulator (LQR), Riccati Recursion, and Dynamic Programming

Recap

Last time discussed LQR formulated as a Quadratic Program (QP).
Introduced Riccati recursion as a method to solve LQR efficiently by utilizing the block sparsity of QP.
Mentioned feedback policy derivation from Riccati recursion for various initial conditions.

Definition: LQR solution over a very long time horizon, where the control gain matrix K converges to a constant steady-state value.
Relation to Systems: Useful for stabilization problems such as balancing a cart-pole system.
Toolboxes: MATLAB, Julia, and Python toolboxes (e.g., dare, dqr) provide functions to compute the infinite horizon K matrix.
Methods: Coupled Iteration and Newton iterations can be used for convergence.

Controllability Conditions: Conditions on system matrices A and B, ensuring the system is controllable and LQR applicable.
Under-Determined Linear System: Requires solving an under-determined system using pseudo-inverse.
Full Rank Requirement: Controllability matrix C must be full rank (rank C = n, where n is the dimension of the state vector).
Controllability Matrix: Defined using the system's dynamics over n time steps. Can be written as C = [B, AB, A^2B, ..., A^(n-1)B].

Bellman's Principle of Optimality: The cost to go from any state X at time K to the goal state is minimal if the trajectory is optimal.
Optimal Cost-to-Go/Value Function: Denoted as V(x), encodes the minimum cost to reach the goal from state x if acting optimally.
Backward Recursion: Used to compute the value function, starting from the terminal cost and working backwards.
Policy: Derived from minimizing the cost-to-go plus stage cost at each step.

Initialization: At final time step n, set V_n(x) = terminal_cost(x) = x^T Q_n x.
Backward Recursion: Solve a quadratic minimization problem to find the feedback policy and value function at each previous step.
Ricotti Equation: Derive a matrix recursion for the P matrices that define the value function.
Control Policy: Given as u = -Kx, where K is derived from the Riccati recursion.

Approximate Dynamic Programming: Feasible for nonlinear systems with function approximations.
Stochastic Systems: Generalized by wrapping the Bellman recursion in expectation operators.

High dimensional state spaces make it infeasible to explicitly compute the value function due to exponential growth in computational complexity.

Connection: Lagrange multipliers λ from trajectory optimization relate to the gradient of the cost-to-go function.
Practical Insight: Numerical examples show that λ = P x, providing insights for system control.

Key Takeaway: LQR, controllability conditions, dynamic programming, Ricotti recursion, and the role of cost-to-go in deriving control policies.
Applications: Understanding these concepts is crucial for stabilization, trajectory optimization, and various control scenarios.

Addressed practical implications and numerical examples to reinforce academic concepts.