GMI-DRL - Empowering Multi-GPU DRL with Adaptive-Grained Parallelism

Explore a groundbreaking conference presentation that introduces GMI-DRL, a novel systematic approach for scaling deep reinforcement learning (DRL) across multi-GPU platforms through adaptive-grained parallelism. Learn how researchers from Rice University, UC Santa Barbara, UC San Diego, University of Rochester, and Pacific Northwest National Laboratory address the inefficiencies in DRL computation caused by heterogeneous tasks and complex inter-task interactions on modern powerful multi-GPU systems. Discover the innovative GPU Multiplexing Instance (GMI) concept, which provides a unified resource-adjustable sub-GPU design specifically tailored for heterogeneous DRL scaling tasks. Understand how the adaptive Coordinator component effectively manages workloads and resources to optimize system performance, while the specialized Communicator enables highly efficient inter-GMI communication to meet diverse communication requirements. Examine comprehensive experimental results demonstrating GMI-DRL's superior performance compared to state-of-the-art DRL acceleration solutions, achieving up to 2.34x improvement in training throughput and up to 40.8% enhancement in GPU utilization on the DGX-A100 platform. Gain insights into the growing importance of DRL in robotics applications for industrial control and autonomous driving, and how this research addresses critical scalability challenges in the field.

Syllabus

USENIX ATC '25 - GMI-DRL: Empowering Multi-GPU DRL with Adaptive-Grained Parallelism

Taught by

USENIX

Reviews

Start your review of GMI-DRL - Empowering Multi-GPU DRL with Adaptive-Grained Parallelism

Google Data Analytics, IBM AI & Meta Marketing — All in One Subscription

Learn Backend Development Part-Time, Online

Taught by

2,000+ Free Courses with Certificates: Coding, AI, SQL, and More

PathWeaver - A High-Throughput Multi-GPU System for Graph-Based Approximate Nearest Neighbor Search

PPipe - Efficient Video Analytics Serving on Heterogeneous GPU Clusters via Pool-Based Pipeline Parallelism

mTuner - Accelerating Parameter-Efficient Fine-Tuning on Multi-GPU Servers with Elastic Tensor

WarpDrive: Orders of Magnitude Faster Multi-Agent Deep Reinforcement Learning on a GPU

ZEN - Empowering Distributed Training with Sparsity-driven Data Synchronization

Finance Certifications Goldman Sachs & Amazon Teams Trust Ad

7 Best Robotics Courses (Free & Paid) for 2026: From Simulation to Real-World Applications

9 Best System Design Courses for 2026: From Coding to Architecting

[2026] 300 Free Robotics Courses to Upgrade Your Intelligence

7 Best MS Project Courses for 2026 (Free & Paid): Plan, Execute, and Succeed

7 Best Erlang Courses and Books for 2026

Never Stop Learning.