Convex MPC and Nonlinear Trajectory Optimization

Jul 16, 2024

Convex MPC and Nonlinear Trajectory Optimization

Recap of Previous Discussion

  • Convex MPC Applications: Examples of convex MPC applications and case studies (legged locomotion, quadrupeds, rocket soft landing).
  • Nonlinear Trajectory Optimization: Defined the problem, discussed differential dynamic programming (DDP) and iLQR algorithms, with detailed math.

Current Focus

  • Recap of Nonlinear Trajectory Optimization: Summary and simplification of previous complex mathematical discussion to help in understanding and implementation.
  • Handling Constraints in DDP: How to incorporate actuator and state constraints in DDP algorithms. Free/minimum time problems handling as well.

Key Points on DDP Recap

  • Unconstrained Nonlinear Trajectory Optimization Problem: Minimize cost function with stage and terminal costs under nonlinear dynamics.
  • Backward Pass: Bellman backup that looks like LQR recursion but tailored for nonlinear dynamics. Use second-order Taylor expansion for the value function.
    • Stages:
      • Value function approximation (second-order Taylor expansion).
      • Terminal cost to initiate backward pass.
      • Minimize second-order action-value function (Q-function).
      • Solution involves feedforward (D) and feedback (K) terms, resembling LQR solutions.
      • Value function updated with new control corrections.

Forward Pass

  • Forward Rollout with Line Search: Use line search to update control trajectory.
    • Change in cost is evaluated and minimized using Armijo rule (backtracking line search).
    • Alpha parameter affects the feedforward term in line search.
    • Iterate backward and forward passes until convergence (steps are sufficiently small).

Examples and Implementation

  • Cart-Pole Example: Cart with a pendulum, objective is to swing pendulum to upright position using nonlinear control.
    • Implementation involves rolling out dynamics and evaluating costs iteratively.
    • Comparison between Gauss-Newton iLQR (691 iterations) and full DDP (552 iterations, but potentially slower).
  • Acrobot Example: Double pendulum with underactuated first joint.
    • Run similar iLQR algorithm for balancing and swing-up tasks.
    • Challenges with non-convex optimization resulting in potential multiple local optimal solutions.

Discussion on DDP and Constraints

  • Advantages:
    • Fast convergence and efficient computation.
    • Provides dynamically feasible solutions always obeying system dynamics.
    • Comes with time-varying LQR tracking controller for stabilization.
  • Disadvantages:
    • Does not natively handle constraints (torque limits, state constraints).
    • Not suitable for problems requiring feasible state trajectory guesses (e.g., obstacle-rich environments).
    • Numerical instability for long trajectories due to recursions through many time steps.

Handling Constraints

  • Torque Limits:
    • Simple methods like squashing functions (e.g., tanh) to limit control inputs
    • Better method: Solve box-constrained QP in backward pass
  • State Constraints:
    • Penalty methods are discouraged due to ill-conditioning.
    • Preferred method: Augmented Lagrangian method, wrapping DDP in an AL framework.

Conclusion and Next Steps

  • Further discussion to handle minimum time problems postponed for next time.

References

  • Recommended paper: Control-Limited Differential Dynamic Programming (Emo Todorov et al., 2011) for in-depth method on dealing with constraints in DDP.