Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

SuperServe - Fine-Grained Inference Serving for Unpredictable Workloads

USENIX via YouTube

Overview

Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Explore a 16-minute conference presentation from NSDI '25 that introduces SuperServe, a novel ML inference serving system designed to handle unpredictable and bursty workloads in production environments. Learn about the challenges of serving multiple machine learning models under varying request patterns while balancing latency, accuracy requirements, and resource efficiency. Discover SubNetAct, an innovative mechanism that uses specialized control-flow operators in pre-trained, weight-shared super-networks to dynamically route requests through networks and actuate specific models that meet individual latency and accuracy targets. Understand how this approach enables serving significantly more models while requiring up to 2.6× lower memory compared to existing systems. Examine the SlackFit scheduling policy and see how SuperServe achieves 4.67% higher accuracy for the same latency targets and 2.85× higher latency target attainment for the same accuracy when tested on real-world Microsoft workload traces. Gain insights into fine-grained, reactive scheduling policies and their impact on ML inference serving efficiency from researchers at Georgia Institute of Technology, UC Berkeley, and Adobe.

Syllabus

NSDI '25 - SuperServe: Fine-Grained Inference Serving for Unpredictable Workloads

Taught by

USENIX

Reviews

Start your review of SuperServe - Fine-Grained Inference Serving for Unpredictable Workloads

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.