Overview
Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
This lecture continues the exploration of policy gradients in reinforcement learning, focusing on variance reduction techniques for more efficient gradient estimation. Learn about reward-to-go and critic functions (both value and Q-functions) as methods to improve baseline estimates in policy optimization. Examine the important bias-variance trade-off in critic design and discover how combining model-based and data-driven approaches through N-step returns can significantly enhance policy learning performance. The presentation addresses practical implementation challenges when applying policy gradients with deep learning frameworks, referencing a detailed blog post with crucial implementation details. Explore the groundbreaking AlphaStar project as a case study, seeing how supervised learning, TD-Lambda, V-trace, and distributed reinforcement learning training were integrated to successfully train a sophisticated policy for StarCraft. The discussion also covers the significant challenges of training reinforcement learning agents in vast state spaces and the substantial computational resources required for such complex learning tasks.
Syllabus
RobotLearning: Scaling Policy Gradients Part 2
Taught by
Montreal Robotics