Boosting Training and Inference Performance via Topology-Aware Scheduling of Heterogeneous Resources
CNCF [Cloud Native Computing Foundation] via YouTube
2,000+ Free Courses with Certificates: Coding, AI, SQL, and More
Python, Prompt Engineering, Data Science — Build the Skills Employers Want Now
Overview
Google, IBM & Meta Certificates — All 10,000+ Courses at 40% Off
One annual plan covers every course and certificate on Coursera. 40% off for a limited time.
Get Full Access
Learn how ByteDance optimizes LLM workload performance through enhanced topology-aware scheduling in this technical conference talk. Explore solutions for managing high-density processors, including die-level affinity implementation and anti-affinity configuration between memory bandwidth-intensive pods. Discover techniques for achieving inter-RDMA affinity at ToR level to prevent switch congestion, implementing GPU-RDMA affinity at PCIe switch level for accelerated communication via GPUDirect RDMA, and establishing job-level topology affinity within Kubernetes scheduler's pod-level operations. Gain insights into addressing K8s topology management limitations for new-generation processors and shifting performance bottlenecks from computation to networking, with practical approaches for handling heterogeneous resources like GPU and RDMA.
Syllabus
Boosting Training and Inference Performance via Topology-Aware Scheduling of Heterogeneous... He Cao
Taught by
CNCF [Cloud Native Computing Foundation]