Empowering ML Workloads With Kubeflow: JAX Distributed Training and LLM Hyperparameter Optimization
CNCF [Cloud Native Computing Foundation] via YouTube
Build the Finance Skills That Lead to Promotions — Not Just Certificates
Earn a Michigan Engineering AI Certificate — Stay Ahead of the AI Revolution
Overview
Google, IBM & Meta Certificates — All 10,000+ Courses at 40% Off
One annual plan covers every course and certificate on Coursera. 40% off for a limited time.
Get Full Access
This 20-minute conference talk from CNCF explores how Kubeflow can enhance machine learning workloads through JAX distributed training and Large Language Model hyperparameter optimization. Presented by Hezhi Xie and Sandipan Panda, the session addresses the growing need for scalable ML solutions in distributed environments. Learn about recent Kubeflow innovations that improve distributed training on Kubernetes with JAX and automate hyperparameter optimization for LLMs. Discover how JAX's high-performance capabilities can be integrated with Kubernetes for efficient scaling, and how the speakers extended Kubeflow to support distributed JAX workloads. The presentation also covers the development of a high-level API that automates the previously manual and time-intensive process of LLM hyperparameter optimization. Understand how these advancements make complex, resource-intensive training more efficient and position Kubeflow as a powerful platform for modern AI development workflows.
Syllabus
Empowering ML Workloads With Kubeflow: JAX Distributed Training and LL... Hezhi Xie & Sandipan Panda
Taught by
CNCF [Cloud Native Computing Foundation]