Introduction to Reinforcement Learning

Jul 22, 2024

Introduction to Reinforcement Learning

Course Overview

  • Admin and logistics for the course
  • Audience: mix of for-credit students and auditors
  • Teaching materials and updates on the course website
  • Encouragement for interactivity and Q&A
  • Assessment structure and details
    • Advanced Topics split between kernel methods and RL
    • 50% coursework, 50% exam, choice of assignments
    • Exams will allow flexibility in question choice
  • Recommended textbooks:
    • “An Introduction to Reinforcement Learning” by Rich Sutton and Andrew Barto
    • “Algorithms for Reinforcement Learning” by Csaba Szepesvári
    • Course notation follows the second edition of Sutton’s book

Understanding Reinforcement Learning (RL)

Placement in Science

  • RL intersects several fields: computer science, engineering, neuroscience, psychology, mathematics, economics
  • Fundamental question: optimizing decision-making processes

RL Characteristics

  • No supervisor (trial and error learning)
  • Delayed feedback (reward signal is not immediate)
  • Sequential decision-making (time matters)
  • Agent influences the environment (actions affect future rewards)

Examples of RL Problems

  • Flying stunt maneuvers in a helicopter
  • Playing Backgammon
  • Managing an investment portfolio
  • Controlling a power station
  • Making a humanoid robot walk
  • Learning to play various Atari games

Key Concepts in RL

Rewards

  • Scalar feedback signal at each time step, indicating performance
  • Goal: maximize cumulative reward
  • Examples of reward structures for different RL problems

State Representations

  • Agent State: what the agent decides to store for decision-making
  • Environment State: the full state of the environment (usually not visible to the agent)
  • Information (Markov) State: a summary that encompasses all relevant information for determining future states

Markov Decision Processes (MDPs)

  • Fully observable environments: agent sees the full state
  • Partially observable environments (POMDPs): agent sees partial state, must infer the rest
  • State representations aim to encapsulate all necessary information concisely

Components of an RL Agent

  • Policy: mapping from states to actions (deterministic or stochastic)
  • Value Function: prediction of expected future reward from a state or action
    • Helps to compare different states and actions
  • Model: agent’s representation of environment dynamics and reward model
    • Model-free methods: do not build a model of the environment
    • Model-based methods: build and use a model of the environment

Example Problem: Maze Navigation

  • Grid world example with goals, rewards, and states
  • Representation of policies and value functions
  • How to use models for planning and decision-making

Taxonomy of RL Agents

  • Key types: Value-based, Policy-based, Actor-Critic
  • Model-Free vs. Model-Based RL

Key Problems in RL

Learning vs. Planning

  • Reinforcement Learning: learning optimal policies through interaction with an unknown environment
  • Planning: optimizing within a known environment model

Exploration vs. Exploitation

  • Balancing discovery of new information vs. optimizing based on current knowledge
  • Examples include trying new restaurants, displaying online ads, oil drilling, game strategies

Prediction vs. Control

  • Prediction: evaluating how well a given policy performs
  • Control: optimizing the policy for maximum reward

Course Structure and Topics

  • Split into two halves: fundamentals and scaling up to complex problems
  • Focus on concepts, followed by applications in the latter half
  • Case studies of successful RL applications in the final part of the course