Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Scaling GenAI Inference - Techniques, Optimizations, and Real-World Lessons

Weights & Biases via YouTube

Overview

Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Learn practical strategies for scaling generative AI models from research prototypes to production-grade systems in this 16-minute conference talk. Discover how to overcome core obstacles in GenAI inference while reducing latency and controlling costs without sacrificing model performance. Explore advanced optimization techniques including batching strategies, model quantization methods, parallelism approaches, KV cache management, and speculative decoding implementations. Gain insights from real-world experience as the session unpacks critical trade-offs, common pitfalls, and essential lessons learned from successfully scaling inference systems in production environments.

Syllabus

Scaling GenAI inference: Techniques, optimizations, and real-world lessons

Taught by

Weights & Biases

Reviews

Start your review of Scaling GenAI Inference - Techniques, Optimizations, and Real-World Lessons

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.