Lead AI Strategy with UCSB's Agentic AI Program — Microsoft Certified
The Investment Banker Certification
Overview
Google, IBM & Meta Certificates — All 10,000+ Courses at 40% Off
One annual plan covers every course and certificate on Coursera. 40% off for a limited time.
Get Full Access
Learn how to enhance AI model inference performance using KServe and vLLM in this 34-minute conference talk from the Linux Foundation. Discover Red Hat's integration of these technologies within OpenShift AI, their MLOps platform, and understand how Red Hat engineers actively contribute to both upstream projects. Explore the architecture and components of Red Hat OpenShift AI, all derived from open source projects, and examine how KServe functions as a model serving platform within this ecosystem. Dive into the advantages of combining vLLM and KServe as the runtime for Large Language Models, including techniques for faster inference and optimized resource consumption through continuous batching, PagedAttention, and speculative decoding. Gain insights into further resource optimization strategies using LLM quantization with vLLM's LLM Compressor library, providing practical knowledge for improving AI model deployment and serving efficiency.
Syllabus
Improve AI Inference (serving models) With KServe and VLLM - Matteo Combi, Red Hat
Taught by
Linux Foundation