Google AI Professional Certificate - Learn AI Skills That Get You Hired
Finance Certifications Goldman Sachs & Amazon Teams Trust
Overview
Google, IBM & Meta Certificates — All 10,000+ Courses at 40% Off
One annual plan covers every course and certificate on Coursera. 40% off for a limited time.
Get Full Access
Explore an innovative approach to parallelizing embedding tables for large-scale recommendation models in this 20-minute conference talk from USENIX ATC '24. Dive into OPER, an algorithm-system co-design that addresses the challenges of deploying Deep Learning Recommendation Models (DLRMs) across multiple GPUs. Learn how OPER's optimality-guided embedding table parallelization technique improves upon existing methods by considering input-dependent behavior, resulting in more balanced workload distribution and reduced inter-GPU communication. Discover the heuristic search algorithm used to approximate near-optimal EMT parallelization and the implementation of a distributed shared memory-based system that supports fine-grained EMT parallelization. Gain insights into the significant performance improvements achieved by OPER, with reported average speedups of 2.3× in training and 4.0× in inference compared to state-of-the-art DLRM frameworks.
Syllabus
USENIX ATC '24 - OPER: Optimality-Guided Embedding Table Parallelization for Large-scale...
Taught by
USENIX