Live Online Classes in Design, Coding & AI — Small Classes, Free Retakes
The Most Addictive Python and SQL Courses
Overview
Google, IBM & Meta Certificates — All 10,000+ Courses at 40% Off
One annual plan covers every course and certificate on Coursera. 40% off for a limited time.
Get Full Access
Learn how to deploy large-scale LLM inference systems without becoming a Kubernetes expert in this conference talk from Ray Summit 2025. Discover how AWS's EKS Auto Mode combined with Ray Serve creates a fully automated, production-grade serving platform that eliminates operational overhead and infrastructure management complexities. Explore the transformation from labor-intensive, manually managed clusters to self-healing, cost-efficient systems through a real-world deployment example. Master intelligent node provisioning tailored for AI workloads, automatic workload-driven scaling for CPUs and GPUs, built-in observability features, seamless GPU lifecycle management, burst capacity handling for maintaining low latency under unpredictable loads, and cost optimization strategies for expensive inference accelerators. Understand how Ray Serve orchestrates high-throughput, multi-model LLM inference to create resilient systems that scale from prototype to production with minimal complexity. Gain a clear blueprint for deploying scalable, reliable, and cost-efficient LLM inference on AWS, perfect for ML engineers wanting to focus on AI applications rather than infrastructure management and platform teams seeking turnkey AI infrastructure solutions.
Syllabus
Scaling Production LLM Inference Using EKS Auto Mode & Ray Serve | Ray Summit 2025
Taught by
Anyscale