Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

AWS + vLLM - Building the Future of Open, Fast LLM Serving

Anyscale via YouTube

Overview

Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Learn how Amazon Web Services advances large-scale LLM inference through deep support and contributions to vLLM, the leading open-source engine for high-throughput, low-latency model serving, in this 14-minute conference talk from Ray Summit 2025. Discover how vLLM serves as a foundational component of Amazon Rufus shopping assistant, handling millions of customer requests through robust support for heterogeneous hardware including AWS Trainium and NVIDIA GPUs. Explore Amazon's cost-optimized, multi-node inference architecture that intelligently routes requests to the most appropriate accelerator, delivering substantial cost savings while maintaining top-tier performance. Examine deployment best practices for running vLLM on AWS at scale, understand how Amazon builds multi-accelerator inference clusters using Trainium and GPUs, and review the open-source work streams and contributions Amazon has made to vLLM. Gain insights into Amazon's production-scale vLLM operations, learn to architect heterogeneous inference pipelines on AWS, and understand Amazon's initiatives to strengthen the vLLM ecosystem for AWS customers and the broader community.

Syllabus

AWS + vLLM: Building the Future of Open, Fast LLM Serving | Ray Summit 2025

Taught by

Anyscale

Reviews

Start your review of AWS + vLLM - Building the Future of Open, Fast LLM Serving

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.