Building Scalable ML Inferencing Pipelines Using Kubernetes
CNCF [Cloud Native Computing Foundation] via YouTube
AI, Data Science & Business Certificates from Google, IBM & Microsoft
Learn Excel & Financial Modeling the Way Finance Teams Actually Use Them
Overview
Google, IBM & Meta Certificates — All 10,000+ Courses at 40% Off
One annual plan covers every course and certificate on Coursera. 40% off for a limited time.
Get Full Access
Explore a conference talk that delves into building robust and scalable Machine Learning inference pipelines using Kubernetes. Learn how to construct performant inferencing services that can handle on-demand scaling while maintaining optimal latency. Discover proven procedures and guidelines for managing inference pipelines on Kubernetes, including detailed insights into hardware requirements (GPU/CPU/memory) and essential K8s configurations for various inference engines. Master the implementation of fault-tolerant pipelines for Large Language Models (LLM) and Retrieval-Augmented Generation (RAG) using fundamental Kubernetes constructs such as operators, statefulsets, and persistent volumes. Gain practical knowledge about setting up automated monitoring systems and implementing effective strategies for troubleshooting and fixing hardware and software component failures in production environments.
Syllabus
Scalable ML Inferencing Pipeline Using K8s - Smitha Jayaram & Vinod Eswaraprasad, NVIDIA
Taught by
CNCF [Cloud Native Computing Foundation]