Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Scaling Instantly to 1000 GPUs for Serverless AI Inference

AWS Events via YouTube

Overview

Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Explore how to achieve instant GPU scaling for serverless AI inference in this conference talk from AWS re:Invent 2025. Discover Modal's innovative approach to building a Rust-based container stack that can dynamically spin thousands of GPUs up and down within seconds, addressing the critical challenge of cost-efficient GPU compute for generative AI model deployment. Learn about the technical architecture and strategies that enable flexible GPU access while maximizing cost efficiency in volatile production inference environments. Understand how Modal leverages AWS's compute and storage products to deliver instant, scalable GPU resources to ML engineers across organizations of all sizes, from startups to large enterprises. Gain insights into solving the fundamental tension between expensive GPU resources and unpredictable inference demand patterns in modern AI applications.

Syllabus

AWS re:Invent 2025 - Scaling instantly to 1000 GPUs for Serverless AI inference (AIM2201)

Taught by

AWS Events

Reviews

Start your review of Scaling Instantly to 1000 GPUs for Serverless AI Inference

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.