Building Scalable ML Inferencing Pipelines Using Kubernetes
CNCF [Cloud Native Computing Foundation] via YouTube
Overview
Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Explore a conference talk that delves into building robust and scalable Machine Learning inference pipelines using Kubernetes. Learn how to construct performant inferencing services that can handle on-demand scaling while maintaining optimal latency. Discover proven procedures and guidelines for managing inference pipelines on Kubernetes, including detailed insights into hardware requirements (GPU/CPU/memory) and essential K8s configurations for various inference engines. Master the implementation of fault-tolerant pipelines for Large Language Models (LLM) and Retrieval-Augmented Generation (RAG) using fundamental Kubernetes constructs such as operators, statefulsets, and persistent volumes. Gain practical knowledge about setting up automated monitoring systems and implementing effective strategies for troubleshooting and fixing hardware and software component failures in production environments.
Syllabus
Scalable ML Inferencing Pipeline Using K8s - Smitha Jayaram & Vinod Eswaraprasad, NVIDIA
Taught by
CNCF [Cloud Native Computing Foundation]