Most AI Pilots Fail to Scale. MIT Sloan Teaches You Why — and How to Fix It
Google, IBM & Microsoft Certificates — All in One Plan
Overview
Google, IBM & Meta Certificates — All 10,000+ Courses at 40% Off
One annual plan covers every course and certificate on Coursera. 40% off for a limited time.
Get Full Access
Explore how AI agents develop sophisticated learning strategies through advanced gradient optimization techniques in this 18-minute video. Delve into the critical role of exploration in reinforcement learning, where agents must navigate trial-and-error processes to discover optimal policies. Examine the challenges posed by sparse reward environments and understand why traditional exploration methods like noise injection often fall short. Learn about intrinsic reward mechanisms and their dual applications: combining with extrinsic rewards for policy optimization and training sub-policies for hierarchical learning structures. Analyze the inherent problems with these approaches, including unstable credit assignment in the former and sample inefficiency with sub-optimality in the latter. Discover cutting-edge research from MMLab at CUHK and Meituan on reasoning reward models for agents, alongside insights from the University of Illinois on intrinsic reward policy optimization specifically designed for sparse-reward environments. Gain understanding of how these advanced techniques enable AI systems to develop more intelligent reasoning capabilities and overcome traditional limitations in reinforcement learning scenarios.
Syllabus
Smarter AI Gradients: How Agents Learn to Think
Taught by
Discover AI