NY State-Licensed Certificates in Design, Coding & AI — Online
Build the Finance Skills That Lead to Promotions — Not Just Certificates
Overview
Google, IBM & Meta Certificates — All 10,000+ Courses at 40% Off
One annual plan covers every course and certificate on Coursera. 40% off for a limited time.
Get Full Access
Learn how to integrate vLLM and KServe for high-performance, scalable large language model deployment in production environments through this 23-minute conference talk from the Linux Foundation. Discover vLLM, a specialized library for LLM inference that delivers exceptional throughput and efficiency using advanced techniques like PagedAttention, continuous batching, and optimized CUDA kernels. Explore KServe, a Kubernetes-based platform that provides robust model deployment capabilities including autoscaling, monitoring, and model versioning for AI models in production. Watch a practical demonstration showing how these two open-source projects integrate to create a powerful solution that combines vLLM's inference optimizations with KServe's scalability features. Understand how organizations can leverage this integration to deploy large language models effectively in production, achieving fast, low-latency inference while maintaining seamless scaling capabilities across cloud platforms.
Syllabus
Fast Inference, Furious Scaling: Leveraging VLLM With KServe - Rafael Vasquez, IBM
Taught by
Linux Foundation