Building Super Intelligent Tools - AI Agent Development and Experimentation

Explore the development of super intelligent programming agents in this 28-minute conference talk by Weights & Biases Co-Founder Shawn Lewis. Learn about the process behind building a programming agent that dominated the SWE-Bench leaderboard for months and discover key insights from agent experimentation loops. Understand AI agent nomenclature, evaluation methods, and experimentation frameworks while examining different types of experimentation processes. Dive into optimization strategies for each step of the experiment loop and discover how W&B Launch addresses complex experimentation challenges. Examine research phase optimization in Weave and explore the potential for AI to automatically improve experimental loops. Investigate the impact of reinforcement learning's resurgence on agent development, including insights from OpenPipe's Kyle Corbit on building reliable agents with RL. Address evaluation limitations through researcher agents and consider the current proximity to achieving self-improving AI systems.

Syllabus

00:00 - The acceleration of AI viewed through programming agents
01:46 - Level-setting on AI agents: nomenclature, evals, and experimentation
03:18 - Experimentation types and processes
05:30 - Optimizing each step of the experiment loop
08:10 - Bringing W&B Launch back to solve knotty experimentation problems
11:50 - Optimizing the research phase in Weave
14:43 - Can we use AI to automatically improve the experimental loop?
17:39 - What changes with the resurgence of reinforcement learning
19:12 - OpenPipe’s Kyle Corbit on building reliable agents with RL
23:14 - Overcoming the limitation of evals with researcher agents
26:08 - How close are we to self-improving AI?