Scaling Policy Gradients Part 2

Build with Azure OpenAI, Copilot Studio & Agentic Frameworks — Microsoft Certified

Learn More →

Free courses from frontend to fullstack and AI

Learn More →

Overview

Google, IBM & Meta Certificates — All 10,000+ Courses at 40% Off

One annual plan covers every course and certificate on Coursera. 40% off for a limited time.

Get Full Access

This lecture continues the exploration of policy gradients in reinforcement learning, focusing on variance reduction techniques for more efficient gradient estimation. Learn about reward-to-go and critic functions (both value and Q-functions) as methods to improve baseline estimates in policy optimization. Examine the important bias-variance trade-off in critic design and discover how combining model-based and data-driven approaches through N-step returns can significantly enhance policy learning performance. The presentation addresses practical implementation challenges when applying policy gradients with deep learning frameworks, referencing a detailed blog post with crucial implementation details. Explore the groundbreaking AlphaStar project as a case study, seeing how supervised learning, TD-Lambda, V-trace, and distributed reinforcement learning training were integrated to successfully train a sophisticated policy for StarCraft. The discussion also covers the significant challenges of training reinforcement learning agents in vast state spaces and the substantial computational resources required for such complex learning tasks.