Divide and Conquer: Master GPU Partitioning and Visualize Savings with OpenCost
CNCF [Cloud Native Computing Foundation] via YouTube
Overview
Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Learn how to optimize GPU workload costs in Kubernetes through a 31-minute conference talk that explores GPU partitioning and cost visualization using OpenCost. Discover the integration of NVIDIA DCGM exporter with Prometheus for effective GPU metrics monitoring, and understand how to leverage the NVIDIA GPU Operator for enhanced resource management. Explore the critical factors affecting AI/ML workload costs, including GPU utilization, VM sizing, and idle time, while gaining insights into how OpenCost bridges communication between developer and platform teams through spend visibility and accountability. Master practical techniques for achieving significant cost savings when running AI and ML workloads, particularly Large Language Models (LLMs), at scale on Kubernetes platforms.
Syllabus
Divide and Conquer: Master GPU Partitioning and Visualize Savings with OpenCost - Kayse Yu
Taught by
CNCF [Cloud Native Computing Foundation]