Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Torpor - GPU-Enabled Serverless Computing for Low-Latency, Resource-Efficient Inference

USENIX via YouTube

Overview

Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Explore a 16-minute conference presentation from USENIX ATC '25 that introduces Torpor, an innovative serverless platform designed to deliver GPU-efficient, low-latency inference services. Learn how researchers from The Chinese University of Hong Kong, Hong Kong University of Science and Technology, Alibaba Group, and Nokia Bell Labs address the critical challenge of existing serverless platforms lacking efficient GPU support for high-performance inference. Discover Torpor's novel approach of maintaining models in main memory and dynamically swapping them onto GPUs upon request arrivals through late binding with model swapping. Understand the technical innovations including asynchronous API redirection, GPU runtime sharing, pipelined model execution, and efficient GPU memory management that minimize latency overhead. Examine the interference-aware request scheduling algorithm that leverages high-speed GPU interconnects to meet latency service-level objectives for individual inference functions. Review the impressive performance results showing how Torpor can concurrently serve hundreds of inference functions on a worker node with 4 GPUs while achieving latency performance comparable to native execution, and learn about the pilot deployment results demonstrating 70% and 65% GPU provisioning cost reductions for users and the platform respectively.

Syllabus

USENIX ATC '25 - Torpor: GPU-Enabled Serverless Computing for Low-Latency, Resource-Efficient...

Taught by

USENIX

Reviews

Start your review of Torpor - GPU-Enabled Serverless Computing for Low-Latency, Resource-Efficient Inference

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.