Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

NanoFlow - Towards Optimal Large Language Model Serving Throughput

USENIX via YouTube

Overview

Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Learn about NanoFlow, a novel serving framework designed to optimize Large Language Model (LLM) serving throughput in this 16-minute conference presentation from OSDI '25. Discover how researchers from University of Washington, Tsinghua University, University of California Berkeley, and University of Michigan challenge the common assumption that LLM serving is memory-bound by demonstrating through detailed analysis that end-to-end LLM serving is actually compute-bound for most common workloads and LLMs. Explore the key insight that existing serving engines fail to achieve optimal compute utilization because heterogeneous operations comprising LLM serving—compute, memory, and networking—are executed sequentially within a device. Understand how NanoFlow exploits intra-device parallelism by overlapping the usage of heterogeneous resources within a single device through splitting inputs into smaller nano-batches and duplicating operations to operate on each portion independently. Examine the automatic optimization process that identifies the optimal number, size, ordering, and GPU resource allocation of nano-batches to minimize execution time while considering interference from concurrent operations. Review comprehensive evaluation results showing NanoFlow's performance on popular models including LLaMA-2-70B, Mixtral 8×7B, and LLaMA-3-8B, where the framework achieves a 1.91× throughput boost compared to state-of-the-art serving systems and reaches 50% to 72% of optimal throughput across popular models with practical workloads.

Syllabus

OSDI '25 - NanoFlow: Towards Optimal Large Language Model Serving Throughput

Taught by

USENIX

Reviews

Start your review of NanoFlow - Towards Optimal Large Language Model Serving Throughput

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.