Exploring Model Predictive Control and Chess

Sep 11, 2024

Notes on Model Credibility Control and Reinforcement Learning Lecture

Introduction

Speaker thanks the organizers for the invitation.
The lecture is based on a course at ASU and recent publications related to reinforcement learning.
Relevant books are available as PDFs on the speaker's website.

Lecture Outline

Connection between Reinforcement Learning (RL) and Model Predictive Control (MPC).
Use of computer chess as a case study.
Introduction of a new theoretical framework for MPC based on Newton's method.
Focus on:
- One-step look-ahead.
- Multi-step look-ahead.
- Stability issues, rollout, and policy duration algorithms.
- Recent applications, particularly in computer chess.

Computer Chess

Chess has evolved significantly since AlphaZero's introduction in 2017.
AlphaZero uses two algorithms:
1. Offline Training Algorithm: Learns position evaluation before playing.
2. Online Play Algorithm: Uses multi-step look-ahead and position evaluation.
Focus on the importance of the online play algorithm.

Model Predictive Control (MPC)

MPC involves:
- L-step look-ahead optimization.
- Cost function approximation.
Deterministic optimal control problem is considered:
- State x_k and control u_k evolve according to system equations.
- Costs depend on state and control.
Classical form of MPC:
- Solve for the first L controls, apply the first control, discard the rest, and repeat.

Comparisons Between MPC and Computer Chess

Similarities in approach:
- Both involve multi-step look ahead and cost evaluation.
Differences:
- Chess has discrete state and control spaces; MPC often has continuous spaces.
- Chess look-ahead trees are pruned for efficiency, whereas MPC optimizes exactly.
Viewing chess as a one-player game against a fixed opponent.

Principal Viewpoints

Online play with one-step look-ahead as a Newton step for solving the Bellman equation.
Offline training provides the starting point and improves it for one-step and multi-step look-ahead.
The framework applies to various problems, including stochastic, deterministic, and multi-agent systems.

Newton's Theoretical Framework

Focus on linear quadratic problems for visualization.
Optimal cost function is quadratic; optimal control is linear.
Riccati equation solves the optimal cost function.
Value iteration can be used for iterative solutions.

Visualization and Iteration

Graphical representation of the Riccati equation’s solution through iteration.
Linear policies represented as lines in the graph; stable policies have slopes less than 1.
The relationship between Riccati operator and policy lines.

One-Step vs Multi-Step Look-Ahead

L-step look-ahead as one step of look-ahead on a modified cost function.
Importance of accurately performing the first step of look-ahead for good convergence properties.
Certainty equivalence can simplify calculations in stochastic problems, especially for look-ahead steps.

Stability Issues

Stability of MPC policies varies based on cost function approximations.
Regions of stability and instability defined by slopes of corresponding MPC policies.
Role of rollout algorithms in producing stable policies from stable base policies.

Applications of MPC

Broad applicability of the Newton step view of MPC across various fields (e.g., robotics, aerospace).
Focus on discrete state/control applications at ASU.
- Multi-agent robotics, data association, sequential decoding, etc.
Recent work on computer chess using MPC architecture (MPCMC).

MPCMC - Improving Chess Performance

MPCMC architecture utilizes a nominal opponent.
Evaluates positions using existing chess engines.
Computational results show effectiveness against various versions of Stockfish engine.
Challenges in parallel computation due to move generation time.

Conclusion and Final Thoughts

Emphasis on the synergy between online play and offline training in MPC.
Newton's method provides crucial insights for algorithm design and analysis.
Suggestion that both MPC and RL communities can learn from each other.
Importance of mathematical understanding in guiding future developments in MPC.

Q&A Session Highlights

Discussion on modeling opponents in control problems (deterministic vs stochastic).
Clarification on scoring in chess matches (e.g., 7.5 - 2.5).
Comparison with existing chess engines and the importance of non-pruning strategies.

Closing

Speaker expresses gratitude and invites further questions.

Full transcript