Colocating ML Inference and Training with Fast GPU Memory Handover

Free courses from frontend to fullstack and AI

Learn More →

Build with Azure OpenAI, Copilot Studio & Agentic Frameworks — Microsoft Certified

Learn More →

Overview

Google, IBM & Meta Certificates — All 10,000+ Courses at 40% Off

One annual plan covers every course and certificate on Coursera. 40% off for a limited time.

Get Full Access

Learn about SIRIUS, an innovative GPU resource sharing system that enables efficient colocation of machine learning inference and training workloads through this 15-minute conference presentation from USENIX ATC '25. Discover how this system addresses the challenge of meeting strict latency Service Level Objectives (SLOs) for inference tasks while maximizing GPU utilization by running training tasks on leftover resources. Explore three key innovations: leveraging gradient computation characteristics to rapidly adjust training memory consumption within milliseconds, implementing explicit memory reclamation management for safe handover processes, and employing SLO-aware memory reallocation strategies to minimize initialization overhead and prevent thrashing under fluctuating workloads. Examine the evaluation results demonstrating SIRIUS's superior performance compared to existing colocation approaches, achieving an average 57.0% improvement in inference SLO compliance (up to 97.0%) and 2.2× enhancement in training throughput (up to 13.7×). Gain insights into spatial GPU resource sharing techniques, memory management strategies for machine learning workloads, and practical solutions for optimizing GPU utilization in production environments where both inference and training tasks must coexist efficiently.

Syllabus

USENIX ATC '25 - Colocating ML Inference and Training with Fast GPU Memory Handover

Taught by

USENIX

Reviews

Start your review of Colocating ML Inference and Training with Fast GPU Memory Handover

Free courses from frontend to fullstack and AI

Build with Azure OpenAI, Copilot Studio & Agentic Frameworks — Microsoft Certified

Taught by

Google, IBM & Microsoft Certificates — All in One Plan

Torpor - GPU-Enabled Serverless Computing for Low-Latency, Resource-Efficient Inference

StreamBox - A Lightweight GPU Sandbox for Serverless Inference Workflow

SAVE - Software-Implemented Fault Tolerance for Model Inference against GPU Memory Bit Flips

Metis - Fast Automatic Distributed Training on Heterogeneous GPUs

Load-Aware GPU Fractioning for LLM Inference on Kubernetes

Learn Excel & Financial Modeling the Way Finance Teams Actually Use Them Ad

14 Best Machine Learning Courses for 2026: Scikit-learn, TensorFlow, and more

12 Best Applied AI & ML Courses for 2026

AI for Good: A DeepLearning.AI Course Review

Unveiling the Mathematical Beauty of Machine Learning: A Review of Steve Brunton’s Course

Learn Something New: 250 Most Popular Courses For October

Never Stop Learning.