GPU-Disaggregated Serving for Deep Learning Recommendation Models at Scale

Learn about Prism, a production deep learning recommendation model (DLRM) serving system that addresses GPU fragmentation challenges through resource disaggregation in this 18-minute conference presentation from NSDI '25. Discover how online recommender systems face efficiency challenges when provisioning DLRM services at scale, as these models require extensive CPU cores and memory but only small numbers of GPUs, leading to resource waste in multi-GPU servers. Explore Prism's innovative architecture that separates CPU nodes and heterogeneous GPU nodes into independently scalable resource pools connected through RDMA, automatically dividing DLRMs into CPU- and GPU-intensive subgraphs for optimized scheduling. Examine the system's latency minimization techniques including optimal graph partitioning, topology-aware resource management, and SLO-aware communication scheduling that achieve 53% reduction in CPU fragmentation and 27% reduction in GPU fragmentation in crowded clusters. Understand how Prism enables efficient capacity loaning from training clusters during seasonal events, saving over 90% of GPUs, and learn from real-world deployment insights from a system running on over 10,000 GPUs in production for more than two years.

Syllabus

NSDI '25 - GPU-Disaggregated Serving for Deep Learning Recommendation Models at Scale

Taught by

USENIX

Reviews

Start your review of GPU-Disaggregated Serving for Deep Learning Recommendation Models at Scale

Google AI Professional Certificate - Learn AI Skills That Get You Hired

Start speaking a new language. It’s just 3 weeks away.

Taught by

Learn the Skills Netflix, Meta, and Capital One Actually Hire For

Primus - Unified Training System for Large-Scale Deep Learning Recommendation Models

Production Multi-node Jobs with Gang Scheduling, K8s, GPUs and RDMA

Scale and Accelerate Distributed Model Training in Kubernetes Clusters

HypeReca - Distributed Heterogeneous In-Memory Embedding Database for Training Recommender Models

Improving GPU Utilization and Accelerating Model Training with Kubernetes Scheduling Framework and NRI

AI Engineer - Learn how to integrate AI into software applications Ad

14 Best Machine Learning Courses for 2026: Scikit-learn, TensorFlow, and more

7 Best MS Project Courses for 2026 (Free & Paid): Plan, Execute, and Succeed

12 Best Applied AI & ML Courses for 2026

AI for Good: A DeepLearning.AI Course Review

Unveiling the Mathematical Beauty of Machine Learning: A Review of Steve Brunton’s Course

Never Stop Learning.