Effortless Scalability - Orchestrating Large Language Model Inference with Kubernetes
CNCF [Cloud Native Computing Foundation] via YouTube
Learn Backend Development Part-Time, Online
Python, Prompt Engineering, Data Science — Build the Skills Employers Want Now
Overview
Google, IBM & Meta Certificates — All 10,000+ Courses at 40% Off
One annual plan covers every course and certificate on Coursera. 40% off for a limited time.
Get Full Access
Explore the intricacies of deploying and orchestrating large open-source inference models on Kubernetes in this 23-minute conference talk by Joinal Ahmed from Navatech AI. Discover how to automate the deployment of heavyweight models like Falcon and Llama 2 using Kubernetes Custom Resource Definitions (CRDs) for seamless management of large model files through container images. Learn about streamlining deployment with an HTTP server for inference calls, eliminating manual tuning of deployment parameters, and auto-provisioning GPU nodes based on specific model requirements. Gain insights into empowering users to deploy containerized models effortlessly by providing pod templates in the workspace custom resource inference field, enabling dynamic creation of deployment workloads that utilize all GPU nodes.
Syllabus
Effortless Scalability: Orchestrating Large Language Model Inference with Kubernetes - Joinal Ahmed
Taught by
CNCF [Cloud Native Computing Foundation]