Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Scaling Production LLM Inference Using EKS Auto Mode and Ray Serve

Anyscale via YouTube

Overview

Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Learn how to deploy large-scale LLM inference systems without becoming a Kubernetes expert in this conference talk from Ray Summit 2025. Discover how AWS's EKS Auto Mode combined with Ray Serve creates a fully automated, production-grade serving platform that eliminates operational overhead and infrastructure management complexities. Explore the transformation from labor-intensive, manually managed clusters to self-healing, cost-efficient systems through a real-world deployment example. Master intelligent node provisioning tailored for AI workloads, automatic workload-driven scaling for CPUs and GPUs, built-in observability features, seamless GPU lifecycle management, burst capacity handling for maintaining low latency under unpredictable loads, and cost optimization strategies for expensive inference accelerators. Understand how Ray Serve orchestrates high-throughput, multi-model LLM inference to create resilient systems that scale from prototype to production with minimal complexity. Gain a clear blueprint for deploying scalable, reliable, and cost-efficient LLM inference on AWS, perfect for ML engineers wanting to focus on AI applications rather than infrastructure management and platform teams seeking turnkey AI infrastructure solutions.

Syllabus

Scaling Production LLM Inference Using EKS Auto Mode & Ray Serve | Ray Summit 2025

Taught by

Anyscale

Reviews

Start your review of Scaling Production LLM Inference Using EKS Auto Mode and Ray Serve

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.