Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

CNCF [Cloud Native Computing Foundation]

Optimizing Data Locality and GPU Utilization for Training Workloads in Kubernetes

CNCF [Cloud Native Computing Foundation] via YouTube

Overview

Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Learn how to optimize data locality and GPU utilization for machine learning training workloads in Kubernetes environments through this 32-minute conference talk. Explore the significant data processing and storage challenges organizations face when scaling model training workloads in cloud-native environments, including managing massive training datasets across distributed storage systems while maintaining optimal I/O performance. Discover how Kubernetes excels at compute orchestration but struggles with data distribution across multiple storage backends, creating bottlenecks that impact training performance and infrastructure costs. Examine a Kubernetes-native distributed caching system that leverages NVMe storage to overcome data locality challenges and improve overall system performance. Gain insights from real-world, large-scale production use cases demonstrating how this architecture reduces data infrastructure costs, increases GPU utilization rates, and enables workload portability to address GPU scarcity challenges in modern cloud-native machine learning deployments.

Syllabus

Optimizing Data Locality and GPU Utilization for Training Workloads in Kubernetes - Bin Fan, Alluxio

Taught by

CNCF [Cloud Native Computing Foundation]

Reviews

Start your review of Optimizing Data Locality and GPU Utilization for Training Workloads in Kubernetes

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.