In-Network Collective Acceleration for AI Fabrics

Learn about In Network Collective (INC) acceleration solutions for AI and machine learning workloads in this 24-minute conference talk from the Open Compute Project. Discover how network switches can offload critical collective operations like All Reduce, ReduceScatter, and AllGather to overcome communication bandwidth limitations in GPU-based systems. Explore how performing reduction operations directly in the switch fabric can reduce network bandwidth requirements by half compared to traditional GPU-based approaches, while enabling higher Model FLOPS Utilization (MFU) and lower memory footprint at GPU endpoints. Examine a practical INC offload solution implemented in high-performance, low-latency Ethernet switches, including detailed performance measurements and real-world application benefits. Get updates on ongoing standardization efforts within the Ultra Ethernet Consortium (UEC) and OCP SAI communities that are shaping the future of network-accelerated AI infrastructure.