Optimizing LLM Efficiency One Trace at a Time on Kubernetes
CNCF [Cloud Native Computing Foundation] via YouTube
Free courses from frontend to fullstack and AI
Most AI Pilots Fail to Scale. MIT Sloan Teaches You Why — and How to Fix It
Overview
Google, IBM & Meta Certificates — All 10,000+ Courses at 40% Off
One annual plan covers every course and certificate on Coursera. 40% off for a limited time.
Get Full Access
Learn how to optimize Large Language Model (LLM) deployments on Kubernetes through a 25-minute conference talk from CNCF experts. Discover techniques for using OpenTelemetry's profiling capabilities to identify resource-intensive code segments, detect memory leaks, and prevent out-of-memory errors in LLM applications. Master the art of dynamic runtime inspection to improve model performance, reduce latency, and meet service level agreements. Gain practical insights into achieving efficient Kubernetes deployments while optimizing resource utilization and controlling costs. Explore methods for deep-level code analysis that enable precise identification of performance bottlenecks and resource drains in LLM implementations.
Syllabus
Optimizing LLM Efficiency One Trace at a Time on Kubernetes - Aditya Soni, Forrester & Seema Saharan
Taught by
CNCF [Cloud Native Computing Foundation]