Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

CNCF [Cloud Native Computing Foundation]

Multi-Node Finetuning LLMs on Kubernetes: A Practitioner's Guide

CNCF [Cloud Native Computing Foundation] via YouTube

Overview

Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Discover the intricacies of multi-node Large Language Model (LLM) finetuning in a comprehensive conference talk that provides a practical, step-by-step implementation guide for Kubernetes clusters with GPUs. Learn how to leverage PyTorch FSDP and the Kubeflow training operator while mastering essential aspects of cluster preparation, optimization techniques, and performance comparisons across various network topologies. Explore critical configurations including pod networking, secondary networks, and GPU Direct RDMA over ethernet to achieve optimal performance. Gain hands-on knowledge about enhancing model performance on specific downstream tasks through enterprise private data finetuning, while understanding the substantial compute resource requirements and unique challenges in Kubernetes environments. Master the implementation details necessary to successfully introduce multi-node LLM finetuning in production Kubernetes environments.

Syllabus

Multi-Node Finetuning LLMs on Kubernetes: A Practitioner’s Guide - Ashish Kamra & Boaz Ben Shabat

Taught by

CNCF [Cloud Native Computing Foundation]

Reviews

Start your review of Multi-Node Finetuning LLMs on Kubernetes: A Practitioner's Guide

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.