Lead AI Strategy with UCSB's Agentic AI Program — Microsoft Certified
Python, Prompt Engineering, Data Science — Build the Skills Employers Want Now
Overview
Google, IBM & Meta Certificates — All 10,000+ Courses at 40% Off
One annual plan covers every course and certificate on Coursera. 40% off for a limited time.
Get Full Access
Explore the infrastructure powering next-generation AI agents through this 23-minute podcast episode that dives deep into reinforcement learning and fine-tuning on Google's TPU architecture. Learn the fundamentals of when to choose fine-tuning over prompt engineering, focusing on specialization, privacy, and cost considerations. Discover the complete model lifecycle with clear breakdowns of pre-training versus post-training processes including supervised fine-tuning (SFT) and reinforcement learning, illustrated through Andrej Karpathy's chemistry textbook analogy. Understand when and why to implement reinforcement learning, its added value in model alignment and safety, and the latest advancements driving 2025 as the year of RL with examples from DeepSeek-R1, Grok 4, and Gemini 3. Examine how TPU pods and Inter-Chip Interconnect (ICI) solve critical bottlenecks in large-scale fine-tuning, addressing the challenges of infrastructure, algorithms, and orchestration in RL implementations. Watch a hands-on demonstration of MaxText 2.0 running a GRPO (Group Relative Policy Optimization) job on TPU infrastructure, showcasing practical reinforcement learning deployment. Gain insights into scaling to 1000+ chips and understand how Google's TPU architecture offers unmatched efficiency for complex AI workloads, with expert commentary from Google TPU Training Team Product Manager Kyle Meggs alongside hosts Shir Meir Lador and Don McCasland.
Syllabus
- Introduction: Gemini 3 and the rise of TPUs
- Why fine-tune? Specialization and privacy
- What is fine-tuning? SFT and RL explained
- What is RL and why do we need it?
- The added value in RL
- Industry pulse: Why 2025 is the year of RL DeepSeek-R1, Grok 4, Gemini 3
- The challenges of RL: Infrastructure, algorithms, and orchestration
- Factory floor: How TPUs are designed for scale
- [Demo] Reinforcement Learning GRPO with MaxText 2.0 on TPUs
- Scaling to 1000+ chips and season wrap up
Taught by
Google Cloud Tech