High-Scale Networking for ML Workloads With Cilium
CNCF [Cloud Native Computing Foundation] via YouTube
Overview
Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
This conference talk explores how G-Research implements Cilium for networking in their massive machine learning environment spanning over 10,000 nodes. Discover how they utilize Cilium as the core networking solution for on-premise, bare-metal clusters that scale up to 1,000 nodes each. Learn about critical Cilium features including network policy implementation for enforcing strict security controls that protect market-sensitive information, host firewall capabilities that eliminate the need for external firewall appliances, and the high-performance eBPF dataplane that directly enhances ML job performance. The presentation also covers advanced topics such as limiting Cilium's identity labels to reduce policy map pressure, tuning conntrack garbage collection, and understanding the performance implications of different policies at scale. Gain practical knowledge about using Cilium's built-in tools to observe and measure large deployments, and learn what to watch for when managing large Kubernetes clusters.
Syllabus
High-Scale Networking for ML Workloads With Cilium - Luigi Zhou, G-Research
Taught by
CNCF [Cloud Native Computing Foundation]