How to Train Your Agent - Building Reliable Agents with RL

Learn how to build reliable AI agents using reinforcement learning techniques in this 20-minute conference talk from the AI Engineer World's Fair. Discover how to overcome the common challenge of agent reliability that prevents many promising demos from reaching production deployment. Explore the GRPO (Generalized Reward-based Policy Optimization) technique through practical case studies, including an email assistant agent that improved its success rate from 74% to 94% by replacing GPT-4 mini with an optimized open-source model. Understand the importance of starting with prompted models before transitioning to reinforcement learning, and examine the performance, cost, and latency benefits of the RL approach. Dive into the two most challenging problems in modern reinforcement learning: creating realistic environments and designing effective reward functions. Learn strategies for optimizing agent behavior with "extra rewards" while avoiding the pitfalls of reward hacking. Gain insights into when this technique works best and discover unexpected challenges to avoid when implementing reinforcement learning for agent training in production environments.

Syllabus

[00:00] - Introduction to building reliable agents with RL.
[00:49] - Case Study: ART-E, an AI email assistant.
[02:19] - The importance of starting with prompted models before moving to RL.
[03:17] - Performance improvements of RL over prompted models.
[05:18] - Cost and latency benefits of the RL approach.
[08:02] - The two hardest problems in modern RL: realistic environments and reward functions.
[13:13] - Optimizing agent behavior with "extra rewards."
[15:25] - The problem of "reward hacking" and how to address it.
[18:37] - The solution to reward hacking: