Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

CNCF [Cloud Native Computing Foundation]

Scaling GPU Clusters Without Melting Down

CNCF [Cloud Native Computing Foundation] via YouTube

Overview

Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
This conference talk explores strategies for scaling GPU clusters in Kubernetes environments as GPUs become more powerful and capable of handling concurrent workloads. Learn from NVIDIA's experience in right-sizing a Kubernetes control plane while meeting increasing business demands. Discover how to measure control plane resource consumption and implement techniques that improve performance and scalability, including golang tunables, kube-apiserver parameters like goaway-chance, and scheduler configurations. Understand the often overlooked impact of YAML volume per API call on system performance. Explore how simulation techniques such as KWOK (Kubernetes WithOut Kubelet) can be used to evaluate new Kubernetes features like Dynamic Resource Allocation (DRA) for control-plane scalability before production deployment.

Syllabus

Scaling GPU Clusters Without Melting Down! - Alay Patel & Ryan Hallisey, NVIDIA

Taught by

CNCF [Cloud Native Computing Foundation]

Reviews

Start your review of Scaling GPU Clusters Without Melting Down

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.