Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

vLLM on AWS - Testing to Production and Everything in Between

AWS Events via YouTube

Overview

Coursera Spring Sale
40% Off Coursera Plus Annual!
Grab it
Explore practical architectural patterns for deploying and scaling large language models in production environments through this comprehensive conference talk from AWS re:Invent 2025. Walk through the complete journey from initial testing to production deployment, covering essential steps including model evaluation with vLLM, performance benchmarking, and optimization techniques. Learn to implement efficient autoscaling solutions using Ray and vLLM, compare different inference servers like Triton and vLLM while understanding their respective trade-offs. Dive deep into productionalization strategies using AIBrix and gain actionable insights for building robust, scalable LLM infrastructures. Designed for ML engineers and architects working on LLM deployments, this 51-minute session provides hands-on guidance for managing the complexities of large language model operations in cloud environments.

Syllabus

AWS re:Invent 2025 - vLLM on AWS: testing to production and everything in between (OPN414)

Taught by

AWS Events

Reviews

Start your review of vLLM on AWS - Testing to Production and Everything in Between

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.