Accelerating Distributed Graph Learning by Using Collaborative In-Network Multicast and Aggregation

Learn how to accelerate distributed graph neural network (GNN) training through innovative in-network processing techniques in this 16-minute conference presentation from USENIX ATC '25. Discover the challenges of distributed GNN training systems, including memory limitations, redundant traffic, and bandwidth bottlenecks that occur when partitioning large graphs across multiple workers. Explore how traditional approaches suffer from complex dependencies among graph data and limited switch-aggregator resources that lead to performance degradation. Understand the proposed SwitchGNN solution that addresses these issues through coordinated in-network multicast and aggregation, featuring a graph-aware multicast reordering algorithm that prioritizes vertices with higher neighbor counts to reduce communication time. Examine the multi-level graph partitioning mechanism that prevents aggregator overflow by partitioning boundary vertices into independent blocks for batch processing while maintaining graph propagation correctness. Review the implementation details using P4 programmable switches and DPDK host stack, along with experimental results from real testbed and NS3 simulations demonstrating up to 74% reduction in training time through effective communication overhead reduction.

Syllabus

USENIX ATC '25 - Accelerating Distributed Graph Learning by Using Collaborative In-Network...

Taught by

USENIX

Reviews

Start your review of Accelerating Distributed Graph Learning by Using Collaborative In-Network Multicast and Aggregation

AI Engineer - Learn how to integrate AI into software applications

Free courses from frontend to fullstack and AI

Taught by

Learn AI, Data Science & Business — Earn Certificates That Get You Hired

LeapGNN: Accelerating Distributed GNN Training Leveraging Feature-Centric Model Migration

Distributed Private Aggregation in Graph Neural Networks

Distributed Deep Graph Learning at Scale

PathWeaver - A High-Throughput Multi-GPU System for Graph-Based Approximate Nearest Neighbor Search

Dorylus - Affordable, Scalable, and Accurate GNN Training with Distributed CPU Servers and Serverless Threads

Lead AI-Native Products with Microsoft's Agentic AI Program Ad

9 Best System Design Courses for 2026: From Coding to Architecting

[2026] Harvard CS50 Guide: How to Pick the Right Course (with Free Certificate)

9 Best Free Scala Courses for 2026: Build Big Data Systems

[2026] 140+ Universities Just Launched 900+ Online Courses. Here’s the Full List.

Harvard CS50 in 2026: How to Get a Free Certificate

Never Stop Learning.