Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

High-Throughput Inference for Synthetic Data and Evals at Sutro

Anyscale via YouTube

Overview

Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Learn how to optimize vLLM for massive-scale batch inference through this conference talk from Ray Summit 2025. Discover Sutro's approach to building an accelerated batch inference service that handles workloads ranging from hundreds of tokens to tens of billions per job for synthetic data generation, evaluations, and large-scale unstructured data processing. Explore the critical importance of predictability in cost, performance, and execution transparency for large offline workloads, and examine Sutro's deeply optimized, vLLM-powered inference engine designed specifically for large batch processing. Dive into custom internal implementation layers built on top of vLLM, including a performance profiler that measures and predicts system behavior in real time, throughput estimation algorithms that inform batching, scheduling, and hardware allocation, and cost attribution instrumentation that provides precise, job-level visibility into resource usage. Gain practical techniques for designing transparent, high-performance vLLM infrastructure at scale, with insights particularly valuable for teams operating at large batch sizes, generating synthetic datasets, or building evaluation pipelines where cost predictability and throughput consistency are essential.

Syllabus

High-Throughput Inference for Synthetic Data & Evals at Sutro | Ray Summit 2025

Taught by

Anyscale

Reviews

Start your review of High-Throughput Inference for Synthetic Data and Evals at Sutro

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.