Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

CNCF [Cloud Native Computing Foundation]

Benchmarking GenAI Foundation Model Inference Optimizations on Kubernetes

CNCF [Cloud Native Computing Foundation] via YouTube

Overview

Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Learn about a Kubernetes SIG project designed to benchmark GenAI foundation model inference optimizations through this conference talk from KubeCon + CloudNativeCon. Discover how foundation models, which are general-purpose deep learning models trained on vast datasets and capable of handling diverse tasks, require optimization techniques to minimize recurring inference costs while maintaining accuracy. Explore various optimization methods including attention mechanism improvements like flash attention and paged attention, model parameter optimizations such as quantization, and serving optimizations including in-flight batching, speculative decoding, disaggregated serving, and smart routing strategies. Understand the critical need for consistent frameworks to measure and benchmark inference performance when testing and deploying optimization techniques. Gain insights into how this standardized benchmarking approach validates the performance and usability of inference optimizations for real-world applications in Kubernetes environments.

Syllabus

Benchmarking GenAI Foundation Model Inference Optimizations on Kubernetes - S.M. Varghese & B. Slabe

Taught by

CNCF [Cloud Native Computing Foundation]

Reviews

Start your review of Benchmarking GenAI Foundation Model Inference Optimizations on Kubernetes

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.