Convex MPC and Nonlinear Trajectory Optimization

Recap of Previous Discussion

Convex MPC Applications: Examples of convex MPC applications and case studies (legged locomotion, quadrupeds, rocket soft landing).
Nonlinear Trajectory Optimization: Defined the problem, discussed differential dynamic programming (DDP) and iLQR algorithms, with detailed math.

Recap of Nonlinear Trajectory Optimization: Summary and simplification of previous complex mathematical discussion to help in understanding and implementation.
Handling Constraints in DDP: How to incorporate actuator and state constraints in DDP algorithms. Free/minimum time problems handling as well.

Unconstrained Nonlinear Trajectory Optimization Problem: Minimize cost function with stage and terminal costs under nonlinear dynamics.
Backward Pass: Bellman backup that looks like LQR recursion but tailored for nonlinear dynamics. Use second-order Taylor expansion for the value function.
- Stages:
  - Value function approximation (second-order Taylor expansion).
  - Terminal cost to initiate backward pass.
  - Minimize second-order action-value function (Q-function).
  - Solution involves feedforward (D) and feedback (K) terms, resembling LQR solutions.
  - Value function updated with new control corrections.

Forward Rollout with Line Search: Use line search to update control trajectory.
- Change in cost is evaluated and minimized using Armijo rule (backtracking line search).
- Alpha parameter affects the feedforward term in line search.
- Iterate backward and forward passes until convergence (steps are sufficiently small).

Cart-Pole Example: Cart with a pendulum, objective is to swing pendulum to upright position using nonlinear control.
- Implementation involves rolling out dynamics and evaluating costs iteratively.
- Comparison between Gauss-Newton iLQR (691 iterations) and full DDP (552 iterations, but potentially slower).
Acrobot Example: Double pendulum with underactuated first joint.
- Run similar iLQR algorithm for balancing and swing-up tasks.
- Challenges with non-convex optimization resulting in potential multiple local optimal solutions.

Advantages:
- Fast convergence and efficient computation.
- Provides dynamically feasible solutions always obeying system dynamics.
- Comes with time-varying LQR tracking controller for stabilization.
Disadvantages:
- Does not natively handle constraints (torque limits, state constraints).
- Not suitable for problems requiring feasible state trajectory guesses (e.g., obstacle-rich environments).
- Numerical instability for long trajectories due to recursions through many time steps.

Torque Limits:
- Simple methods like squashing functions (e.g., tanh) to limit control inputs
- Better method: Solve box-constrained QP in backward pass
State Constraints:
- Penalty methods are discouraged due to ill-conditioning.
- Preferred method: Augmented Lagrangian method, wrapping DDP in an AL framework.

Recommended paper: Control-Limited Differential Dynamic Programming (Emo Todorov et al., 2011) for in-depth method on dealing with constraints in DDP.