High-Scale Networking for ML Workloads With Cilium
CNCF [Cloud Native Computing Foundation] via YouTube
Google Data Analytics, IBM AI & Meta Marketing — All in One Subscription
Learn Excel & Financial Modeling the Way Finance Teams Actually Use Them
Overview
Google, IBM & Meta Certificates — All 10,000+ Courses at 40% Off
One annual plan covers every course and certificate on Coursera. 40% off for a limited time.
Get Full Access
This conference talk explores how G-Research implements Cilium for networking in their massive machine learning environment spanning over 10,000 nodes. Discover how they utilize Cilium as the core networking solution for on-premise, bare-metal clusters that scale up to 1,000 nodes each. Learn about critical Cilium features including network policy implementation for enforcing strict security controls that protect market-sensitive information, host firewall capabilities that eliminate the need for external firewall appliances, and the high-performance eBPF dataplane that directly enhances ML job performance. The presentation also covers advanced topics such as limiting Cilium's identity labels to reduce policy map pressure, tuning conntrack garbage collection, and understanding the performance implications of different policies at scale. Gain practical knowledge about using Cilium's built-in tools to observe and measure large deployments, and learn what to watch for when managing large Kubernetes clusters.
Syllabus
High-Scale Networking for ML Workloads With Cilium - Luigi Zhou, G-Research
Taught by
CNCF [Cloud Native Computing Foundation]