Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Scaling Deep Q-Learning - Part 1

Montreal Robotics via YouTube

Overview

Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
This lecture segment explores the evolution from simple bandits to Q-learning, providing a comprehensive overview of reinforcement learning challenges and solutions. Begin with an examination of multi-armed bandits and the fundamental exploration-exploitation dilemma, learning about strategies like epsilon-greedy and upper confidence bound (UCB) that balance these competing objectives. Progress to contextual bandits that incorporate state information, and then to Q-learning which develops state-dependent policies. Discover the advantages of Q-learning over policy gradients, including its ability to learn from off-policy data and reduced variance. Explore approximate dynamic programming concepts, understanding how value and policy iteration methods train Q-functions. Examine the computational challenges of these approaches, particularly the resource-intensive process of performing argmax operations across all possible actions, and learn how policy iteration mitigates these costs through bootstrapping on previous policies. The 37-minute lecture concludes by suggesting the potential efficiency gains of combining policy evaluation and improvement into a single step.

Syllabus

RobotLearning: Scaling Deep Q-Learning Part1

Taught by

Montreal Robotics

Reviews

Start your review of Scaling Deep Q-Learning - Part 1

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.