Iterating on LLM Apps at Scale - Learnings from Discord

Learn best practices for rapidly evaluating and iterating on large language model applications at scale through insights from Discord's engineering team. This 18-minute conference talk covers development workflows and evaluation methodologies essential for measuring model and prompt improvements, mitigating risks, and accelerating development cycles. Explore the refined best practices implemented internally at Discord, discover the tooling and automation strategies that enable consistent shipping of improvements, and understand the unexpected challenges that emerge when deploying LLMs in production environments. Gain practical knowledge about common launch blockers, evaluation frameworks, balancing cost versus accuracy considerations, building an evaluation culture within engineering teams, implementing effective observability systems, creating feedback loops, managing prompts efficiently, and conducting red team exercises to identify potential vulnerabilities and edge cases in LLM-powered applications.

Syllabus

Introduction
Biggest repeat launch blockers
evals
Keep it simple
Cost vs accuracy
Building aneval culture
Observability
Feedback Loop
Prompt Management
Red teaming