Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

RDMA P2P Deep Dive - KvCache Transfer, Weight Updates and MoE Routing at Perplexity

Anyscale via YouTube

Overview

Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Explore RDMA-based peer-to-peer communication strategies for modern Large Language Model systems in this 31-minute conference talk from Ray Summit 2025. Learn how current LLM systems rely on collective communication through torch.distributed and NCCL APIs, which create unnecessary constraints on peer-to-peer data movement as models scale and Mixture-of-Experts architectures become more prevalent. Discover how inter-node communication becomes a critical bottleneck when workloads diversify across training and inference boundaries. Examine essential RDMA primitives and understand the design of Perplexity's new communication library API through three high-impact use cases: KvCache transfer for disaggregated inference, weight transfer between training and inference nodes during reinforcement learning rollouts, and MoE dispatch-combine all-to-all kernels. Gain insights into alternative communication strategies that can unlock new efficiencies in next-generation LLM and MoE systems, moving beyond traditional SPMD-based approaches to enable more flexible and efficient data movement patterns.

Syllabus

RDMA P2P Deep Dive: KvCache Transfer, Weight Updates & MoE Routing at Perplexity | Ray Summit 2025

Taught by

Anyscale

Reviews

Start your review of RDMA P2P Deep Dive - KvCache Transfer, Weight Updates and MoE Routing at Perplexity

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.