Coconote
AI notes
AI voice & video notes
Export note
Try for free
Introduction to Reinforcement Learning
Jul 22, 2024
Introduction to Reinforcement Learning
Course Overview
Admin and logistics for the course
Audience: mix of for-credit students and auditors
Teaching materials and updates on the course website
Encouragement for interactivity and Q&A
Assessment structure and details
Advanced Topics split between kernel methods and RL
50% coursework, 50% exam, choice of assignments
Exams will allow flexibility in question choice
Recommended textbooks:
“An Introduction to Reinforcement Learning” by Rich Sutton and Andrew Barto
“Algorithms for Reinforcement Learning” by Csaba Szepesvári
Course notation follows the second edition of Sutton’s book
Understanding Reinforcement Learning (RL)
Placement in Science
RL intersects several fields: computer science, engineering, neuroscience, psychology, mathematics, economics
Fundamental question: optimizing decision-making processes
RL Characteristics
No supervisor (trial and error learning)
Delayed feedback (reward signal is not immediate)
Sequential decision-making (time matters)
Agent influences the environment (actions affect future rewards)
Examples of RL Problems
Flying stunt maneuvers in a helicopter
Playing Backgammon
Managing an investment portfolio
Controlling a power station
Making a humanoid robot walk
Learning to play various Atari games
Key Concepts in RL
Rewards
Scalar feedback signal at each time step, indicating performance
Goal: maximize cumulative reward
Examples of reward structures for different RL problems
State Representations
Agent State: what the agent decides to store for decision-making
Environment State: the full state of the environment (usually not visible to the agent)
Information (Markov) State: a summary that encompasses all relevant information for determining future states
Markov Decision Processes (MDPs)
Fully observable environments: agent sees the full state
Partially observable environments (POMDPs): agent sees partial state, must infer the rest
State representations aim to encapsulate all necessary information concisely
Components of an RL Agent
Policy
: mapping from states to actions (deterministic or stochastic)
Value Function
: prediction of expected future reward from a state or action
Helps to compare different states and actions
Model
: agent’s representation of environment dynamics and reward model
Model-free methods: do not build a model of the environment
Model-based methods: build and use a model of the environment
Example Problem: Maze Navigation
Grid world example with goals, rewards, and states
Representation of policies and value functions
How to use models for planning and decision-making
Taxonomy of RL Agents
Key types: Value-based, Policy-based, Actor-Critic
Model-Free vs. Model-Based RL
Key Problems in RL
Learning vs. Planning
Reinforcement Learning: learning optimal policies through interaction with an unknown environment
Planning: optimizing within a known environment model
Exploration vs. Exploitation
Balancing discovery of new information vs. optimizing based on current knowledge
Examples include trying new restaurants, displaying online ads, oil drilling, game strategies
Prediction vs. Control
Prediction: evaluating how well a given policy performs
Control: optimizing the policy for maximum reward
Course Structure and Topics
Split into two halves: fundamentals and scaling up to complex problems
Focus on concepts, followed by applications in the latter half
Case studies of successful RL applications in the final part of the course
📄
Full transcript