Exploring Model Predictive Control and Chess

Sep 11, 2024

Notes on Model Credibility Control and Reinforcement Learning Lecture

Introduction

  • Speaker thanks the organizers for the invitation.
  • The lecture is based on a course at ASU and recent publications related to reinforcement learning.
  • Relevant books are available as PDFs on the speaker's website.

Lecture Outline

  • Connection between Reinforcement Learning (RL) and Model Predictive Control (MPC).
  • Use of computer chess as a case study.
  • Introduction of a new theoretical framework for MPC based on Newton's method.
  • Focus on:
    • One-step look-ahead.
    • Multi-step look-ahead.
    • Stability issues, rollout, and policy duration algorithms.
    • Recent applications, particularly in computer chess.

Computer Chess

  • Chess has evolved significantly since AlphaZero's introduction in 2017.
  • AlphaZero uses two algorithms:
    1. Offline Training Algorithm: Learns position evaluation before playing.
    2. Online Play Algorithm: Uses multi-step look-ahead and position evaluation.
  • Focus on the importance of the online play algorithm.

Model Predictive Control (MPC)

  • MPC involves:
    • L-step look-ahead optimization.
    • Cost function approximation.
  • Deterministic optimal control problem is considered:
    • State x_k and control u_k evolve according to system equations.
    • Costs depend on state and control.
  • Classical form of MPC:
    • Solve for the first L controls, apply the first control, discard the rest, and repeat.

Comparisons Between MPC and Computer Chess

  • Similarities in approach:
    • Both involve multi-step look ahead and cost evaluation.
  • Differences:
    • Chess has discrete state and control spaces; MPC often has continuous spaces.
    • Chess look-ahead trees are pruned for efficiency, whereas MPC optimizes exactly.
  • Viewing chess as a one-player game against a fixed opponent.

Principal Viewpoints

  • Online play with one-step look-ahead as a Newton step for solving the Bellman equation.
  • Offline training provides the starting point and improves it for one-step and multi-step look-ahead.
  • The framework applies to various problems, including stochastic, deterministic, and multi-agent systems.

Newton's Theoretical Framework

  • Focus on linear quadratic problems for visualization.
  • Optimal cost function is quadratic; optimal control is linear.
  • Riccati equation solves the optimal cost function.
  • Value iteration can be used for iterative solutions.

Visualization and Iteration

  • Graphical representation of the Riccati equation’s solution through iteration.
  • Linear policies represented as lines in the graph; stable policies have slopes less than 1.
  • The relationship between Riccati operator and policy lines.

One-Step vs Multi-Step Look-Ahead

  • L-step look-ahead as one step of look-ahead on a modified cost function.
  • Importance of accurately performing the first step of look-ahead for good convergence properties.
  • Certainty equivalence can simplify calculations in stochastic problems, especially for look-ahead steps.

Stability Issues

  • Stability of MPC policies varies based on cost function approximations.
  • Regions of stability and instability defined by slopes of corresponding MPC policies.
  • Role of rollout algorithms in producing stable policies from stable base policies.

Applications of MPC

  • Broad applicability of the Newton step view of MPC across various fields (e.g., robotics, aerospace).
  • Focus on discrete state/control applications at ASU.
    • Multi-agent robotics, data association, sequential decoding, etc.
  • Recent work on computer chess using MPC architecture (MPCMC).

MPCMC - Improving Chess Performance

  • MPCMC architecture utilizes a nominal opponent.
  • Evaluates positions using existing chess engines.
  • Computational results show effectiveness against various versions of Stockfish engine.
  • Challenges in parallel computation due to move generation time.

Conclusion and Final Thoughts

  • Emphasis on the synergy between online play and offline training in MPC.
  • Newton's method provides crucial insights for algorithm design and analysis.
  • Suggestion that both MPC and RL communities can learn from each other.
  • Importance of mathematical understanding in guiding future developments in MPC.

Q&A Session Highlights

  • Discussion on modeling opponents in control problems (deterministic vs stochastic).
  • Clarification on scoring in chess matches (e.g., 7.5 - 2.5).
  • Comparison with existing chess engines and the importance of non-pruning strategies.

Closing

  • Speaker expresses gratitude and invites further questions.