Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

PPipe - Efficient Video Analytics Serving on Heterogeneous GPU Clusters via Pool-Based Pipeline Parallelism

USENIX via YouTube

Overview

Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Learn about PPipe, a novel inference serving system that leverages pool-based pipeline parallelism to efficiently serve video analytics on heterogeneous GPU clusters in this 17-minute conference talk from USENIX ATC '25. Discover how researchers from Purdue University demonstrate the effective application of pipeline parallelism—traditionally used for throughput-oriented deep learning model training—to latency-bound model inference scenarios. Explore the synergy between diversity in model layers and GPU architectures, revealing how low-class and high-class GPUs can achieve comparable inference latency for many layers. Understand the system's architecture featuring an MILP-based control plane and a data plane that performs resource reservation-based adaptive batching. Examine evaluation results across 18 CNN models showing PPipe's ability to achieve 41.1%–65.5% higher utilization of low-class GPUs while maintaining high utilization of high-class GPUs, resulting in 32.2%–75.1% higher serving throughput compared to baseline approaches. Gain insights into how this approach addresses the growing prevalence of heterogeneous GPU clusters in both public clouds and on-premise data centers.

Syllabus

USENIX ATC '25 - PPipe: Efficient Video Analytics Serving on Heterogeneous GPU Clusters via...

Taught by

USENIX

Reviews

Start your review of PPipe - Efficient Video Analytics Serving on Heterogeneous GPU Clusters via Pool-Based Pipeline Parallelism

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.