📚

Lecture on Linear Quadratic Regulator (LQR), Riccati Recursion, and Dynamic Programming

Jul 16, 2024

Lecture on Linear Quadratic Regulator (LQR), Riccati Recursion, and Dynamic Programming

Recap

  • Last time discussed LQR formulated as a Quadratic Program (QP).
  • Introduced Riccati recursion as a method to solve LQR efficiently by utilizing the block sparsity of QP.
  • Mentioned feedback policy derivation from Riccati recursion for various initial conditions.

Topics for Today

Infinite Horizon LQR

  • Definition: LQR solution over a very long time horizon, where the control gain matrix K converges to a constant steady-state value.
  • Relation to Systems: Useful for stabilization problems such as balancing a cart-pole system.
  • Toolboxes: MATLAB, Julia, and Python toolboxes (e.g., dare, dqr) provide functions to compute the infinite horizon K matrix.
  • Methods: Coupled Iteration and Newton iterations can be used for convergence.

Controllability

  • Controllability Conditions: Conditions on system matrices A and B, ensuring the system is controllable and LQR applicable.
  • Under-Determined Linear System: Requires solving an under-determined system using pseudo-inverse.
  • Full Rank Requirement: Controllability matrix C must be full rank (rank C = n, where n is the dimension of the state vector).
  • Controllability Matrix: Defined using the system's dynamics over n time steps. Can be written as C = [B, AB, A^2B, ..., A^(n-1)B].

Dynamic Programming

  • Bellman's Principle of Optimality: The cost to go from any state X at time K to the goal state is minimal if the trajectory is optimal.
  • Optimal Cost-to-Go/Value Function: Denoted as V(x), encodes the minimum cost to reach the goal from state x if acting optimally.
  • Backward Recursion: Used to compute the value function, starting from the terminal cost and working backwards.
  • Policy: Derived from minimizing the cost-to-go plus stage cost at each step.

Example: Applying to LQR

  • Initialization: At final time step n, set V_n(x) = terminal_cost(x) = x^T Q_n x.
  • Backward Recursion: Solve a quadratic minimization problem to find the feedback policy and value function at each previous step.
  • Ricotti Equation: Derive a matrix recursion for the P matrices that define the value function.
  • Control Policy: Given as u = -Kx, where K is derived from the Riccati recursion.

Generalization to Nonlinear Systems

  • Approximate Dynamic Programming: Feasible for nonlinear systems with function approximations.
  • Stochastic Systems: Generalized by wrapping the Bellman recursion in expectation operators.

Curse of Dimensionality

  • High dimensional state spaces make it infeasible to explicitly compute the value function due to exponential growth in computational complexity.

Lagrange Multipliers and Co-States

  • Connection: Lagrange multipliers λ from trajectory optimization relate to the gradient of the cost-to-go function.
  • Practical Insight: Numerical examples show that λ = P x, providing insights for system control.

Summary

  • Key Takeaway: LQR, controllability conditions, dynamic programming, Ricotti recursion, and the role of cost-to-go in deriving control policies.
  • Applications: Understanding these concepts is crucial for stabilization, trajectory optimization, and various control scenarios.

Q&A

  • Addressed practical implications and numerical examples to reinforce academic concepts.