Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Linux Foundation

Generative AI Model Data Pre-Training on Kubernetes - A Use Case Study

Linux Foundation via YouTube

Overview

Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Explore how to leverage Kubernetes for large-scale generative AI model data preprocessing in this conference talk from the Linux Foundation. Learn how Large Language Models require preprocessing vast amounts of data—often spanning petabytes—a process that can take days due to its complexity and scale. Discover how Kubeflow Pipelines (KFP) simplify LLM data processing by providing flexibility, repeatability, and scalability for enterprise applications, as demonstrated through daily use at IBM Research for building indemnified LLMs. Compare different data preparation toolkits built on Kubernetes, Rust, Slurm, or Spark, and understand the decision-making process for choosing the right toolkit for LLM experiments or enterprise use cases. Examine how the open source Data Prep Toolkit leverages KFP and KubeRay for scalable pipeline orchestration, including processes like deduplication, content classification, and tokenization. Gain insights from real-world challenges, lessons learned, and practical experiences with KFP, while exploring its applicability for diverse LLM tasks such as data preprocessing, RAG retrieval, and model fine-tuning.

Syllabus

Generative AI Model Data Pre-Training on Kubernetes: A Use Case S... Anish Asthana & Mohammad Nassar

Taught by

Linux Foundation

Reviews

Start your review of Generative AI Model Data Pre-Training on Kubernetes - A Use Case Study

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.