Learn AI, Data Science & Business — Earn Certificates That Get You Hired
Future-Proof Your Career: AI Manager Masterclass
Overview
AI, Data Science & Cloud Certificates from Google, IBM & Meta — 40% Off
One plan covers every Professional Certificate on Coursera. 40% off Coursera Plus Annual.
Unlock All Certificates
Learn how to extend InfiniBand GPUDirect Async (IBGDA) support beyond Nvidia GPUs and InfiniBand networks to create GPU-agnostic solutions for open AI systems. Discover the implementation of IBGDA technology using Ethernet NICs running RoCE (RDMA over Converged Ethernet), which traditionally has been limited to specific hardware configurations. Explore the technical challenges encountered when adapting this high-performance networking technology for broader compatibility and understand optimization strategies for Open Compute Project (OCP) AI systems and various GPU platforms. Examine how RDMA technology enables AI scale-out interconnects and how IBGDA improvements in latency and message rates are achieved by allowing GPUs to directly initiate RDMA transactions between endpoints. Gain insights into the architectural considerations and practical implementation approaches for making advanced GPU networking capabilities more accessible across different hardware ecosystems in AI infrastructure deployments.
Syllabus
Enable IBGDA support in a GPU agnostic manner for open AI systems
Taught by
Open Compute Project