Reinforcement Learning Course Introduction

Lecturer Information

Course topic: Reinforcement Learning (RL)
Recommended book: "Reinforcement Learning: An Introduction" by Rich Sutton and Andy Bartow
Course admin for UCL credit students:
- Use Moodle for updates and communication
- Assignments graded, no exams

What is Reinforcement Learning?
- A method for machines to learn how to make decisions and achieve goals through interaction.
- Involves concepts of learning, autonomy, and decision-making.
Relation to AI
- AI aims to find solutions autonomously.
- Historical context: From the Industrial Revolution (physical automation) to the Digital Revolution (mental automation) to AI (autonomous solution-finding).
- Alan Turing's paper (1950) on "Computing Machinery and Intelligence" explores machine learning like a child.

Artificial Intelligence
- Goal: Learn to make decisions to achieve goals.
- Related to RL as a framework for AI.
Reinforcement Learning Characteristics
- Active learning, interaction with environment
- Sequential interactions
- Goal-directed actions
- Can learn without examples of optimal behavior

Interaction Loop
- Agent interacts with the environment
- Agent executes actions and observes resulting changes.
- Goal: Optimize long-term cumulative reward.
Reward Hypothesis
- Any goal can be achieved by maximizing cumulative rewards.

Flying helicopters, managing investment portfolios, controlling power stations, making robots walk, playing games.
Example: Atari games used to test RL algorithms like DQN.

Agent State
- Internal condition of the agent affecting decisions.
- Can be fully observable or partially observable.
Policy
- Maps states to actions.
- Can be deterministic or stochastic.
Value Function
- Expected return from a state under a policy.
- Recursive and can be used for policy evaluation.
Model
- Predicts environment dynamics.
- Used for planning optimal strategies.

Learning: Involves gathering experience and updating knowledge.
Planning: Uses internal computation to predict future actions based on a model.

Exploration of principles and algorithms in RL
Core topics include exploration, MDPs, dynamic programming, model-free algorithms, policy gradients, actor-critic methods, deep RL, integration of learning and planning.