Introduction to Reinforcement Learning

Course Overview

Admin and logistics for the course
Audience: mix of for-credit students and auditors
Teaching materials and updates on the course website
Encouragement for interactivity and Q&A
Assessment structure and details
- Advanced Topics split between kernel methods and RL
- 50% coursework, 50% exam, choice of assignments
- Exams will allow flexibility in question choice
Recommended textbooks:
- “An Introduction to Reinforcement Learning” by Rich Sutton and Andrew Barto
- “Algorithms for Reinforcement Learning” by Csaba Szepesvári
- Course notation follows the second edition of Sutton’s book

RL intersects several fields: computer science, engineering, neuroscience, psychology, mathematics, economics
Fundamental question: optimizing decision-making processes

Agent State: what the agent decides to store for decision-making
Environment State: the full state of the environment (usually not visible to the agent)
Information (Markov) State: a summary that encompasses all relevant information for determining future states

Fully observable environments: agent sees the full state
Partially observable environments (POMDPs): agent sees partial state, must infer the rest
State representations aim to encapsulate all necessary information concisely

Policy: mapping from states to actions (deterministic or stochastic)
Value Function: prediction of expected future reward from a state or action
- Helps to compare different states and actions
Model: agent’s representation of environment dynamics and reward model
- Model-free methods: do not build a model of the environment
- Model-based methods: build and use a model of the environment

Reinforcement Learning: learning optimal policies through interaction with an unknown environment
Planning: optimizing within a known environment model

Balancing discovery of new information vs. optimizing based on current knowledge
Examples include trying new restaurants, displaying online ads, oil drilling, game strategies