How To Supercharge AI/ML Observability With OpenTelemetry and Fluent Bit

This conference talk explores how to build an advanced open source observability stack tailored for AI/ML workloads using Fluent Bit and OpenTelemetry. Learn essential strategies for keeping AI/ML models performant and reliable in production environments, especially when running on Kubernetes. Discover techniques for logging and debugging popular models like GPT, BERT, and custom LLMs, tracking prompts and their results to gain actionable insights, and monitoring agent performance in production environments. See how combining OpenTelemetry's robust tracing and error stack trace capabilities with Fluent Bit's resource-efficient log processing, live tail, and metrics scraping creates a comprehensive observability solution specifically designed for AI/ML workloads. Gain practical tools and approaches to enhance system reliability and performance for AI/ML practitioners working with Kubernetes in this 18-minute presentation from the Cloud Native Computing Foundation (CNCF).