Master AI & Data—50% Off Udacity (Code CC50)
Gain a Splash of New Skills - Coursera+ Annual Nearly 45% Off
Overview
Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Explore a technical presentation from Cloud Field Day 19 that delves into Broadcom's high-performance Ethernet NIC solutions for AI/ML clusters. Distinguished Engineer and Architect Hemal Shah demonstrates how the increasing complexity of AI/ML workloads demands robust networking capabilities, focusing on the Thor 2 400 gig NIC's advanced features. Learn about RDMA over Converged Ethernet (RoCE), sophisticated congestion control mechanisms, and the importance of end-to-end fabric management in large-scale networks. Discover a reference architecture designed for AI/ML clusters that can scale to thousands of GPUs, incorporating Broadcom switches and NICs for optimal performance. Gain insights into key technical specifications including PCIe Gen 5 by 16 host interface compatibility, hardware root of trust security features, and bi-directional line rates with low latency capabilities. Understand how these networking solutions contribute to efficient job completion times and overall cluster performance in demanding AI/ML environments.
Syllabus
Broadcom Thor 2: High Performance Ethernet NIC for AI/ML
Taught by
Tech Field Day