The Limits of Prompting - Architecting Trustworthy Coding Agents

Explore the fundamental limitations of large language models in code review through this 24-minute conference talk that examines why transformer models struggle with understanding code intent and impact. Discover how LLMs, while capable of code summarization and autocompletion, often exhibit overconfidence, repetition, and poor judgment when performing code reviews. Learn about advanced techniques including reasoning phases, memory systems, and reflection mechanisms that can enhance precision and build trust in AI-powered developer tools. Examine real-world examples from Baz's code review agent to understand practical implementations of these concepts. Gain insights into benchmarking methodologies, task splitting strategies, and the critical importance of context in AI code analysis. Understand what separates effective code reviewers from basic automated tools and discover key takeaways for moving beyond simple prompt engineering toward building reliable, trustworthy coding agents for software development workflows.

Syllabus

00:00 Getting Started
01:23 Code Review Agent
06:40 Benchmarking Insights
12:50 Task Splitting
15:10 Context Matters
21:11 Awesome Reviewers
22:57 Key Takeaways