UC San Diego Product Management Certificate — AI-Powered PM Training
Python, Prompt Engineering, Data Science — Build the Skills Employers Want Now
Overview
Google, IBM & Meta Certificates — All 10,000+ Courses at 40% Off
One annual plan covers every course and certificate on Coursera. 40% off for a limited time.
Get Full Access
Discover essential strategies for scaling generative AI inference from research to production in this 25-minute conference talk. Learn about critical model-level optimization techniques including quantization, batching, caching, and hardware-aware optimizations that bridge the performance gap between experimental results and real-world deployment. Explore system-level practices such as redundancy implementation, automated failover mechanisms, and multi-cloud operations that strengthen infrastructure reliability and ensure continuous service availability during hardware failures, network fluctuations, and sudden traffic spikes. Gain insights into creating a resilient, dependable, and production-ready foundation for scaling AI systems that can handle enterprise-level demands while maintaining consistent performance and reliability.
Syllabus
Scaling Inference for Generative AI by Byung-Gon Chun
Taught by
Open Data Science