Why Observability Matters More with AI Applications

Learn how to implement comprehensive observability for AI applications in production through this 51-minute InfoQ conference talk that addresses the critical gap between AI adoption and reliable deployment. Discover why traditional monitoring approaches fall short for LLM-powered applications and explore the unique challenges these systems present, including non-uniform behavior, high costs, and different performance patterns compared to conventional microservices. Follow along with a live demonstration that builds an end-to-end monitoring solution using an open-source stack including vLLM, Llama Stack, Prometheus, Tempo, and Grafana deployed on Kubernetes. Understand the fundamental differences between prefill and decode phases in LLM serving patterns and learn to configure proper tracing with OpenTelemetry sidecars. Explore critical signals that must be tracked for RAG, agentic, and multi-turn applications, focusing on performance metrics, cost optimization, and quality assessment. Gain hands-on experience analyzing GPU usage patterns, interpreting vLLM dashboards, and examining distributed traces in Grafana to ensure business-critical AI workloads run with full transparency and reliability.

Syllabus

0:00 - Introduction: Why AI Observability Matters Now
1:15 - Live Demo Preview: RAG with Llama Stack & Safety Features
4:30 - The State of AI in Enterprise: Moving from Research to Business-Critical
6:55 - Unique Monitoring Challenges Posed by LLMs
9:15 - Prefill vs. Decode: The Core Difference in LLM Serving Patterns
12:05 - Building the Open-Source Stack: Prometheus, Grafana, Tempo, and OTel
15:00 - Kubernetes Deep Dive: ServiceMonitors Explained
18:45 - Deploying the Model: Using llm-d for vLLM Quick Start
22:10 - Configuring Tracing with Llama Stack and OTel Sidecars
27:50 - Critical Signals to Monitor: Performance, Cost, and Quality
32:00 - Live Demo: Analyzing GPU Usage, vLLM Dashboards & Traces in Grafana
37:45 - Q&A: Open-Source Cost, Langfuse, and Actionable Analytics for Different Personas