Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

How DigitalOcean Builds Next-Gen Inference with Ray, vLLM and More

Anyscale via YouTube

Overview

Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Learn how to build a robust, scalable inference platform for next-generation generative models through this 17-minute conference talk from Ray Summit 2025. Discover DigitalOcean's approach to handling the rising complexity of inference as models grow in size, context length, and modality, using Ray and vLLM running on Kubernetes for both serverless and dedicated GPU workloads. Explore how Ray's scheduling primitives ensure reliable execution across distributed clusters, while placement groups guarantee GPU affinity and predictable performance, with Ray observability tools providing deep insights into system health and workload behavior. Understand how vLLM delivers fast token streaming, optimized batching, and advanced memory/KV-cache management to meet real-world performance requirements. Examine two key operational modes: serverless inference for automatic scaling and cost efficiency, and dedicated inference for fine-grained GPU partitioning and performance isolation. Dive into advanced optimization techniques for long-context models exceeding 8k tokens, including dynamic batching by token length, KV cache reuse strategies, and speculative decoding for improved latency and throughput. Get insights into the roadmap for a fully multimodal, multi-tenant inference platform featuring concurrent model orchestration, tenant isolation, security-aware billing, and a unified model registry for intelligent model placement and lifecycle management. Gain practical knowledge for building future-ready inference platforms capable of serving large, dynamic, multimodal generative models at scale, whether you're optimizing production stacks or architecting new systems.

Syllabus

How DigitalOcean Builds Next-Gen Inference with Ray, vLLM & More | Ray Summit 2025

Taught by

Anyscale

Reviews

Start your review of How DigitalOcean Builds Next-Gen Inference with Ray, vLLM and More

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.