Overview
Interview with Tuomas Sandholm on game theory, AI breakthroughs in poker, defense, and kidney exchange, plus views on LLMs, ethics, and entrepreneurship.
Guest Background
- Finnish-American computer scientist; TKK/Aalto graduate (1991); PhD in USA; professor and entrepreneur.
- Recognized by Goldman Sachs among top 100 entrepreneurs; winner of Alfred Kordelin Prize.
- H-index 98: at least 98 papers each cited ≥98 times.
Game Theory: Core Ideas
- Studies strategic decision-making with multiple players where outcomes depend on others’ strategies.
- Solution concepts (e.g., Nash equilibrium): no player benefits by unilateral deviation.
- Perfect information games: all states visible (chess, go). Imperfect information games: hidden information (poker).
- Sandholm’s vision: solve large, realistic games computationally, not by hand.
Poker AI Breakthroughs
- Began poker research in 2004; early losses, steady algorithmic improvement.
- 2015: Claudico lost to pros; key learnings.
- 2017: Libratus decisively beat four top-10 heads-up No-Limit Texas Hold’em pros.
- 2019: Pluribus achieved superhuman performance in six-player No-Limit Hold’em.
Why It Mattered
- Imperfect-information game benchmark solved at superhuman level.
- Real-world applications (defense, business) are mostly imperfect-information games.
Key Algorithmic Innovations
- Real-time reasoning for imperfect-information games: compute situational strategy while maintaining global balance.
- Middlegame reasoning for scalability in multi-player settings.
- Approximate equilibria computed; full game size is astronomical (~10^161 states for 2-player, 200BB).
Human Strategy Insights from Bots
- Overbets and small underbets gained prominence.
- Bot behaviors contradicting poker books:
- Limping (8–12% in heads-up) used effectively with proper follow-up strategy.
- Donk betting used in specific, probabilistic contexts.
Large Language Models (LLMs) vs Game-Theoretic AI
- Poker/strategy AIs optimize directly from rules; no human data training; produce novel strategies.
- LLMs: token-by-token text prediction; issues include hallucinations, data scarcity, self-training on outputs.
- Strategic reasoning relies on game trees and information sets; fundamentally different from word sequences.
- LLMs useful, but less reliable for strategy; risk of degrading users’ own skills with overreliance.
Kidney Exchange: AI Saving Lives
- Problem: Living donor pairs often incompatible; deceased donor supply insufficient.
- Solution: National kidney exchange optimizes cycles (rings) and chains.
- Chains from altruistic donors (no return kidney) enable ongoing matches; first chain run 2006; widely adopted by 2011.
- Since 2010, Sandholm’s algorithms have run the US national exchange weekly; 80% of centers participate.
- Result: Over 10,000 life-saving transplants enabled via chains worldwide.
Optimization and Incentives
- Finding optimal combination of rings and chains is NP-hard; algorithms compute provably optimal matchings.
- Hospitals often “cheat” by doing local matches first; all centers exploited this when possible.
- Proposed credit mechanisms to improve incentives (conceptual work).
Strategy Robot (Company)
- Products: software based on computational game theory for defense applications; bootstrapped and profitable.
- Use cases across tactical, operational, and strategic levels:
- Fighter tactics in sensor-driven, imperfect-information environments.
- Carrier strike group and drone swarm employment.
- Strategic stability: nuclear deterrence, escalation/de-escalation modeling.
- Not autonomous physical robots; “robot” is metaphor for strategy generation.
- Collaboration with US DoD; details limited to public disclosures.
Ethics and Life-or-Death Decisions
- Kidney exchange: algorithms decide weekly who gets organs; humans set values, algorithms optimize policy.
- Ends vs means distinction:
- Ends (value judgments) are human: maximize number of transplants, added life-years, prioritize highly sensitized patients, etc.
- Means (policy optimization) are computational: map value goals to optimal allocations.
- Defense work follows existing DoD policies; air-to-air contexts avoid civilian dilemmas.
Combinatorial Auctions and Mechanism Design
- Founded CombineNet; ran large-scale combinatorial procurement auctions (e.g., P&G trucking).
- Innovations:
- Expressive bidding languages and fast optimization for NP-hard allocation problems.
- Automated mechanism design insights; some optimal designs inherently exponential in size.
- Practical example: capturing known cost advantages (e.g., Korean aluminum suppliers) via tailored mechanisms to reduce buyer costs.
Views on AI, Society, and Art
- Future of work: high-skill professions disrupted; specialists remain valuable longer than generalists.
- Art: AI-human hybrids will produce compelling works; competitions may need human-only categories.
- Risk: cognitive decline from overuse of generative tools; observed classroom performance gaps between homework (with tools) and exams.
Media, Public Reaction, and Security
- Poker community backlash due to livelihood threats and bot proliferation on sites (not by Sandholm’s team).
- US law changes and bot risks reduced online poker profitability.
- Social media hostility experienced; no threats reported.
Investments and Companies
- Entrepreneurial track: CombineNet (sold 2010), several smaller firms, Strategy Robot.
- Investing approach: buy/hold, focus on understood businesses; example: AppLovin high appreciation.
- Positive on Google for AI talent and valuation (not investment advice).
Key Terms & Definitions
- H-index: researcher has H papers each cited ≥H times (impact metric).
- Nash equilibrium: strategy profile where no player benefits by unilateral deviation.
- Imperfect-information game: players have private information (e.g., poker hands).
- Real-time reasoning: computing situational strategies while preserving global equilibrium constraints.
- NP-hard: class of problems that are computationally intractable to solve exactly at scale.
- Altruistic donor: a donor who gives without a designated recipient, enabling chains.
Structured Highlights
| Topic | Problem | Approach | Result/Impact |
|---|
| Poker AI (Heads-up) | Imperfect information; massive state space | Real-time reasoning; approximate equilibrium | Libratus beat top pros (2017) |
| Poker AI (Multi-player) | Larger game trees; multiple equilibria | Middlegame reasoning; scalable algorithms | Pluribus superhuman (2019) |
| Kidney Exchange | Incompatibility; low deceased donor supply | Optimal rings and chains; altruistic chain innovation | National weekly optimization; 10,000+ transplants via chains |
| Defense Tactics | Sensor-based, hidden-state combat | Computational game theory for tactics/strategy | Best-in-class fighter tactics; strategic stability modeling |
| Combinatorial Procurement | Complex supplier constraints | Expressive bids + NP-hard optimization | Global auctions; significant cost reductions |
| Mechanism Design | Optimal sales design complexity | Automated mechanism design | Proofs of exponential-size optimal mechanisms |
Action Items / Next Steps
- For policy makers: specify value objectives (e.g., life-years vs count) explicitly; let algorithms optimize.
- For defense planners: incorporate computational game-theoretic tools into escalation stability analysis.
- For researchers: expand real-time reasoning methods to broader imperfect-information domains.
- For educators: design assessments that sustain human skill development alongside AI tool use.