Overview
Syllabus
0:00 - Introduction: Why AI Observability Matters Now
1:15 - Live Demo Preview: RAG with Llama Stack & Safety Features
4:30 - The State of AI in Enterprise: Moving from Research to Business-Critical
6:55 - Unique Monitoring Challenges Posed by LLMs
9:15 - Prefill vs. Decode: The Core Difference in LLM Serving Patterns
12:05 - Building the Open-Source Stack: Prometheus, Grafana, Tempo, and OTel
15:00 - Kubernetes Deep Dive: ServiceMonitors Explained
18:45 - Deploying the Model: Using llm-d for vLLM Quick Start
22:10 - Configuring Tracing with Llama Stack and OTel Sidecars
27:50 - Critical Signals to Monitor: Performance, Cost, and Quality
32:00 - Live Demo: Analyzing GPU Usage, vLLM Dashboards & Traces in Grafana
37:45 - Q&A: Open-Source Cost, Langfuse, and Actionable Analytics for Different Personas
Taught by
InfoQ