Boosting Training and Inference Performance via Topology-Aware Scheduling of Heterogeneous Resources
CNCF [Cloud Native Computing Foundation] via YouTube
AI, Data Science & Cloud Certificates from Google, IBM & Meta
2,000+ Free Courses with Certificates: Coding, AI, SQL, and More
Overview
AI, Data Science & Cloud Certificates from Google, IBM & Meta — 40% Off
One plan covers every Professional Certificate on Coursera. 40% off Coursera Plus Annual.
Unlock All Certificates
Learn how ByteDance optimizes LLM workload performance through enhanced topology-aware scheduling in this technical conference talk. Explore solutions for managing high-density processors, including die-level affinity implementation and anti-affinity configuration between memory bandwidth-intensive pods. Discover techniques for achieving inter-RDMA affinity at ToR level to prevent switch congestion, implementing GPU-RDMA affinity at PCIe switch level for accelerated communication via GPUDirect RDMA, and establishing job-level topology affinity within Kubernetes scheduler's pod-level operations. Gain insights into addressing K8s topology management limitations for new-generation processors and shifting performance bottlenecks from computation to networking, with practical approaches for handling heterogeneous resources like GPU and RDMA.
Syllabus
Boosting Training and Inference Performance via Topology-Aware Scheduling of Heterogeneous... He Cao
Taught by
CNCF [Cloud Native Computing Foundation]