Divide and Conquer: Master GPU Partitioning and Visualize Savings with OpenCost
CNCF [Cloud Native Computing Foundation] via YouTube
Google, IBM & Meta Certificates — 40% Off for a Limited Time
Learn the Skills Netflix, Meta, and Capital One Actually Hire For
Overview
Google, IBM & Meta Certificates — All 10,000+ Courses at 40% Off
One annual plan covers every course and certificate on Coursera. 40% off for a limited time.
Get Full Access
Learn how to optimize GPU workload costs in Kubernetes through a 31-minute conference talk that explores GPU partitioning and cost visualization using OpenCost. Discover the integration of NVIDIA DCGM exporter with Prometheus for effective GPU metrics monitoring, and understand how to leverage the NVIDIA GPU Operator for enhanced resource management. Explore the critical factors affecting AI/ML workload costs, including GPU utilization, VM sizing, and idle time, while gaining insights into how OpenCost bridges communication between developer and platform teams through spend visibility and accountability. Master practical techniques for achieving significant cost savings when running AI and ML workloads, particularly Large Language Models (LLMs), at scale on Kubernetes platforms.
Syllabus
Divide and Conquer: Master GPU Partitioning and Visualize Savings with OpenCost - Kayse Yu
Taught by
CNCF [Cloud Native Computing Foundation]