Coursera Plus Annual Nearly 45% Off
Power BI Fundamentals - Create visualizations and dashboards from scratch
Overview
Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Learn how to deploy large-scale LLM inference systems without becoming a Kubernetes expert in this conference talk from Ray Summit 2025. Discover how AWS's EKS Auto Mode combined with Ray Serve creates a fully automated, production-grade serving platform that eliminates operational overhead and infrastructure management complexities. Explore the transformation from labor-intensive, manually managed clusters to self-healing, cost-efficient systems through a real-world deployment example. Master intelligent node provisioning tailored for AI workloads, automatic workload-driven scaling for CPUs and GPUs, built-in observability features, seamless GPU lifecycle management, burst capacity handling for maintaining low latency under unpredictable loads, and cost optimization strategies for expensive inference accelerators. Understand how Ray Serve orchestrates high-throughput, multi-model LLM inference to create resilient systems that scale from prototype to production with minimal complexity. Gain a clear blueprint for deploying scalable, reliable, and cost-efficient LLM inference on AWS, perfect for ML engineers wanting to focus on AI applications rather than infrastructure management and platform teams seeking turnkey AI infrastructure solutions.
Syllabus
Scaling Production LLM Inference Using EKS Auto Mode & Ray Serve | Ray Summit 2025
Taught by
Anyscale