Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Scaling Policy Gradients Part 2

Montreal Robotics via YouTube

Overview

Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
This lecture continues the exploration of policy gradients in reinforcement learning, focusing on variance reduction techniques for more efficient gradient estimation. Learn about reward-to-go and critic functions (both value and Q-functions) as methods to improve baseline estimates in policy optimization. Examine the important bias-variance trade-off in critic design and discover how combining model-based and data-driven approaches through N-step returns can significantly enhance policy learning performance. The presentation addresses practical implementation challenges when applying policy gradients with deep learning frameworks, referencing a detailed blog post with crucial implementation details. Explore the groundbreaking AlphaStar project as a case study, seeing how supervised learning, TD-Lambda, V-trace, and distributed reinforcement learning training were integrated to successfully train a sophisticated policy for StarCraft. The discussion also covers the significant challenges of training reinforcement learning agents in vast state spaces and the substantial computational resources required for such complex learning tasks.

Syllabus

RobotLearning: Scaling Policy Gradients Part 2

Taught by

Montreal Robotics

Reviews

Start your review of Scaling Policy Gradients Part 2

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.