Stuck in Tutorial Hell? Learn Backend Dev the Right Way
Build AI Apps with Azure, Copilot, and Generative AI — Microsoft Certified
Overview
Google, IBM & Meta Certificates — All 10,000+ Courses at 40% Off
One annual plan covers every course and certificate on Coursera. 40% off for a limited time.
Get Full Access
Explore a groundbreaking approach to deep learning model serving in this 15-minute conference talk from OSDI '23. Discover how AlpaServe, a novel serving system, leverages model parallelism for statistical multiplexing across multiple devices, even when individual models fit on a single device. Learn about the trade-off between model parallelism overhead and the benefits of statistical multiplexing in reducing serving latency for bursty workloads. Gain insights into AlpaServe's efficient strategy for placing and parallelizing large deep learning models across distributed clusters. Examine evaluation results from production workloads, showcasing AlpaServe's ability to process requests at significantly higher rates and handle increased burstiness while maintaining latency constraints for over 99% of requests.
Syllabus
OSDI '23 - AlpaServe: Statistical Multiplexing with Model Parallelism for Deep Learning Serving
Taught by
USENIX