Ray Direct Transport - RDMA Support in Ray Core

Discover how Ray Direct Transport (RDT) eliminates critical bottlenecks in distributed GPU workloads through this 32-minute conference talk from Ray Summit 2025. Learn from Anyscale engineers Stephanie Wang and Qiaolin Yu as they demonstrate how RDT dramatically accelerates AI and reinforcement learning systems by removing the need for tensor serialization and CPU memory routing in traditional Ray pipelines. Explore the core problem where GPU-intensive workloads suffer from overhead as every tensor must pass through CPU memory and the Ray object store, creating significant slowdowns in reinforcement learning for LLMs and multimodal training pipelines. Understand how RDT maintains GPU data on devices and enables direct actor-to-actor transfers using high-performance backends like NCCL, Gloo, and RDMA for zero-copy, serialization-free GPU communication. Examine the clean integration with Ray's ObjectRef API that makes adoption straightforward, and dive into the architecture enabling direct device-to-device data movement across distributed clusters. See live demonstrations of RDT powering cutting-edge workloads including disaggregated multimodal training and reinforcement learning for large language models, while gaining insights into how RDT transforms Ray into a high-performance GPU coordination layer for next-generation large-scale AI applications.