Question 1
How is the cumulative reward represented in the context of the value function (V) in reinforcement learning?
Question 2
What are the main challenges with Q-Learning in reinforcement learning?
Question 3
Which technique is used in Deep Q-Learning to store and reuse experiences?
Question 4
What are the key components of a Markov Decision Process (MDP)?
Question 5
What is the initialization process in an MDP?
Question 6
What problem is illustrated by the Cart-Pole example in reinforcement learning?
Question 7
What kind of update does the actor perform in Actor-Critic Methods?
Question 8
How does the Q-Learning algorithm update its Q-values?
Question 9
What does the policy π in reinforcement learning represent?
Question 10
What kind of function does a policy in reinforcement learning typically represent?
Question 11
In the context of reinforcement learning, what does the Bellman Equation help to compute?
Question 12
How does AlphaGo use reinforcement learning techniques to achieve its performance?
Question 13
How do Policy Gradients achieve variance reduction?
Question 14
What is the significance of using convolutional layers in Deep Q-Learning for Atari games?
Question 15
What problem does the REINFORCE algorithm aim to solve?