Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Generative AI Model Data Pre-Training on Kubernetes - A Use Case Study

DevConf via YouTube

Overview

Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Explore how Kubeflow Pipelines (KFP) streamline Large Language Model data preprocessing at enterprise scale in this 34-minute conference talk from DevConf.US 2025. Learn how IBM Research processes petabytes of data daily using KFP to build indemnified LLMs for enterprise applications, addressing the complexity and scale challenges that typically span days of processing time. Discover the advantages of choosing Kubernetes-based solutions over alternatives like Rust, Slurm, or Spark for LLM experiments and enterprise use cases. Examine how the open source Data Prep Toolkit leverages KFP and KubeRay for scalable pipeline orchestration, including critical processes like deduplication, content classification, and tokenization. Gain insights into real-world challenges, lessons learned, and practical applications of KFP for diverse LLM tasks including data preprocessing, RAG retrieval, and model fine-tuning, with speaker Santosh Borse sharing direct experience from IBM Research's daily operations.

Syllabus

Generative AI Model Data Pre-Training on Kubernetes: A Use Case Study - DevConf.US 2025

Taught by

DevConf

Reviews

Start your review of Generative AI Model Data Pre-Training on Kubernetes - A Use Case Study

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.