Harof Nassos, Research Scientist at DeepMind, London
Teaches course at UCL
Pre-recorded due to COVID-19 pandemic
Co-lecturers: Diana Borsa, Matteo Hessel
Course Structure
Course topic: Reinforcement Learning (RL)
Recommended book: "Reinforcement Learning: An Introduction" by Rich Sutton and Andy Bartow
Course admin for UCL credit students:
Use Moodle for updates and communication
Assignments graded, no exams
Introduction to Reinforcement Learning
What is Reinforcement Learning?
A method for machines to learn how to make decisions and achieve goals through interaction.
Involves concepts of learning, autonomy, and decision-making.
Relation to AI
AI aims to find solutions autonomously.
Historical context: From the Industrial Revolution (physical automation) to the Digital Revolution (mental automation) to AI (autonomous solution-finding).
Alan Turing's paper (1950) on "Computing Machinery and Intelligence" explores machine learning like a child.
Key Concepts
Artificial Intelligence
Goal: Learn to make decisions to achieve goals.
Related to RL as a framework for AI.
Reinforcement Learning Characteristics
Active learning, interaction with environment
Sequential interactions
Goal-directed actions
Can learn without examples of optimal behavior
Fundamental Components of RL
Interaction Loop
Agent interacts with the environment
Agent executes actions and observes resulting changes.
Goal: Optimize long-term cumulative reward.
Reward Hypothesis
Any goal can be achieved by maximizing cumulative rewards.
Examples of RL Applications
Flying helicopters, managing investment portfolios, controlling power stations, making robots walk, playing games.
Example: Atari games used to test RL algorithms like DQN.
Reinforcement Learning Problem
Defined by learning to make decisions from interaction.
Involves prediction and control.
Ambitious in scope but promising for generic problem-solving.
Reinforcement Learning Components
Agent State
Internal condition of the agent affecting decisions.
Can be fully observable or partially observable.
Policy
Maps states to actions.
Can be deterministic or stochastic.
Value Function
Expected return from a state under a policy.
Recursive and can be used for policy evaluation.
Model
Predicts environment dynamics.
Used for planning optimal strategies.
Categories of RL Agents
Value-Based: Learns value function, policy derived from it.
Policy-Based: Learns policy directly.
Actor-Critic: Both value function and policy are explicitly represented.
Model-Free: No explicit model of environment.
Model-Based: Includes explicit model for planning.
Sub-Problems in RL
Prediction: Evaluate value function for a given policy.
Control: Optimize policy to maximize rewards.
Learning vs. Planning
Learning: Involves gathering experience and updating knowledge.
Planning: Uses internal computation to predict future actions based on a model.
Use of Deep Learning
Deep learning integrated to handle large-scale RL problems.
Challenges include non-stationarity and data correlation in RL.
Course Focus
Exploration of principles and algorithms in RL
Core topics include exploration, MDPs, dynamic programming, model-free algorithms, policy gradients, actor-critic methods, deep RL, integration of learning and planning.
Conclusion
Reinforcement learning formalizes the AI problem.
Huge potential if generic algorithms are successful