Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Linux Foundation

Improve AI Inference - Serving Models With KServe and VLLM

Linux Foundation via YouTube

Overview

Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Learn how to enhance AI model inference performance using KServe and vLLM in this 34-minute conference talk from the Linux Foundation. Discover Red Hat's integration of these technologies within OpenShift AI, their MLOps platform, and understand how Red Hat engineers actively contribute to both upstream projects. Explore the architecture and components of Red Hat OpenShift AI, all derived from open source projects, and examine how KServe functions as a model serving platform within this ecosystem. Dive into the advantages of combining vLLM and KServe as the runtime for Large Language Models, including techniques for faster inference and optimized resource consumption through continuous batching, PagedAttention, and speculative decoding. Gain insights into further resource optimization strategies using LLM quantization with vLLM's LLM Compressor library, providing practical knowledge for improving AI model deployment and serving efficiency.

Syllabus

Improve AI Inference (serving models) With KServe and VLLM - Matteo Combi, Red Hat

Taught by

Linux Foundation

Reviews

Start your review of Improve AI Inference - Serving Models With KServe and VLLM

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.