Python, Prompt Engineering, Data Science — Build the Skills Employers Want Now
Lead AI-Native Products with Microsoft's Agentic AI Program
Overview
Google, IBM & Meta Certificates — All 10,000+ Courses at 40% Off
One annual plan covers every course and certificate on Coursera. 40% off for a limited time.
Get Full Access
This 48-minute InfoQ video explores the challenges of scaling Large Language Model (LLM) batch inference and demonstrates how to combine Ray Data with vLLM to achieve high throughput and cost-effective processing. Dive into techniques for leveraging heterogeneous computing resources, implementing fault tolerance for reliability, and optimizing inference pipelines for maximum efficiency. Examine real-world case studies that showcase significant performance improvements and cost reductions when processing large volumes of data through LLMs. Learn practical approaches to overcome common bottlenecks in batch inference workflows and implement scalable solutions for production environments.
Syllabus
Scaling LLM Batch Inference: Ray Data & vLLM for High Throughput
Taught by
InfoQ