Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Linux Foundation

Enabling Seamless AI Workloads - Achieving Zero-Downtime Upgrades for FUSE in Kubernetes

Linux Foundation via YouTube

Overview

Coursera Spring Sale
40% Off Coursera Plus Annual!
Grab it
Explore a comprehensive conference talk that addresses critical challenges in maintaining high-throughput AI workloads within Kubernetes environments. Learn how to implement zero-downtime upgrades for FUSE (Filesystem in Userspace) systems that support demanding applications like autonomous driving and large-scale recommendation systems. Discover practical solutions for overcoming common issues such as file descriptor invalidation, cache loss, and write interruptions that typically occur during filesystem upgrades or restarts. Examine real-world implementation strategies for self-healing mounts and rolling client upgrades in FUSE-based distributed file systems, with deep integration into Kubernetes CSI and Operators. Understand why the default CSI lifecycle proves inadequate for FUSE-based systems and gain insights into redesigning client upgrade processes to maintain active I/O sessions without disruption. Benefit from lessons learned in large-scale production deployments, including analysis of key failure cases encountered in early versions and the evolution of solutions that ensure GPUs remain fully utilized during system maintenance operations.

Syllabus

Enabling Seamless AI Workloads: Achieving Zero-Downtime Upgrades for FUSE in Kubernetes - Weiwei Zhu

Taught by

Linux Foundation

Reviews

Start your review of Enabling Seamless AI Workloads - Achieving Zero-Downtime Upgrades for FUSE in Kubernetes

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.