Who Ate My Resources? Trace LLM Performance in Real Time With OpenTelemetry

Learn to trace and optimize Large Language Model (LLM) performance in real-time using OpenTelemetry's profiling capabilities in this 27-minute conference talk from the Linux Foundation. Discover how to identify resource-intensive code that consumes excessive CPU, GPU, and memory resources, creates bottlenecks, or causes performance degradation in LLM applications. Master techniques for dynamically inspecting application behavior and performance at runtime to transform your efficiency approach. Explore methods to pinpoint specific code segments responsible for excessive resource consumption, memory leaks, and out-of-memory errors. Understand how to improve LLM performance by analyzing model behavior, reducing latency, and meeting service level agreements and objectives. Gain insights into achieving efficient Kubernetes deployments that optimize resource utilization and reduce costs. Develop skills in leveraging OpenTelemetry's profiling features to optimize LLM code at a granular level, enabling better resource management and performance monitoring in production environments.