Learn AI, Data Science & Business — Earn Certificates That Get You Hired
Master Production-Ready Machine Learning, Step by Step
Overview
AI, Data Science & Cloud Certificates from Google, IBM & Meta — 40% Off
One plan covers every Professional Certificate on Coursera. 40% off Coursera Plus Annual.
Unlock All Certificates
Learn how to enhance AI model inference performance using KServe and vLLM in this 34-minute conference talk from the Linux Foundation. Discover Red Hat's integration of these technologies within OpenShift AI, their MLOps platform, and understand how Red Hat engineers actively contribute to both upstream projects. Explore the architecture and components of Red Hat OpenShift AI, all derived from open source projects, and examine how KServe functions as a model serving platform within this ecosystem. Dive into the advantages of combining vLLM and KServe as the runtime for Large Language Models, including techniques for faster inference and optimized resource consumption through continuous batching, PagedAttention, and speculative decoding. Gain insights into further resource optimization strategies using LLM quantization with vLLM's LLM Compressor library, providing practical knowledge for improving AI model deployment and serving efficiency.
Syllabus
Improve AI Inference (serving models) With KServe and VLLM - Matteo Combi, Red Hat
Taught by
Linux Foundation