The Fastest Way to Become a Backend Developer Online
Learn AI, Data Science & Business — Earn Certificates That Get You Hired
Overview
AI, Data Science & Cloud Certificates from Google, IBM & Meta — 40% Off
One plan covers every Professional Certificate on Coursera. 40% off Coursera Plus Annual.
Unlock All Certificates
This 48-minute InfoQ video explores the challenges of scaling Large Language Model (LLM) batch inference and demonstrates how to combine Ray Data with vLLM to achieve high throughput and cost-effective processing. Dive into techniques for leveraging heterogeneous computing resources, implementing fault tolerance for reliability, and optimizing inference pipelines for maximum efficiency. Examine real-world case studies that showcase significant performance improvements and cost reductions when processing large volumes of data through LLMs. Learn practical approaches to overcome common bottlenecks in batch inference workflows and implement scalable solutions for production environments.
Syllabus
Scaling LLM Batch Inference: Ray Data & vLLM for High Throughput
Taught by
InfoQ