Efficient LLM Deployment: A Unified Approach with Ray, VLLM, and Kubernetes
CNCF [Cloud Native Computing Foundation] via YouTube
Pass the PMP® Exam on Your First Try — Expert-Led Training
MIT Sloan AI Adoption: Build a Playbook That Drives Real Business ROI
Overview
Google, IBM & Meta Certificates — All 10,000+ Courses at 40% Off
One annual plan covers every course and certificate on Coursera. 40% off for a limited time.
Get Full Access
Explore a comprehensive conference talk that dives deep into deploying Large Language Models (LLMs) efficiently using an integrated approach combining Ray, VLLM, and Kubernetes. Learn how to leverage cloud-native orchestration, distributed computing, and advanced LLMOps in the wake of ChatGPT's transformative impact on AI applications. Master the fundamentals of Kubernetes for managing AI workloads across cloud environments, understand Ray's capabilities as an open-source framework for scaling distributed applications, and discover VLLM's innovative features for high-performance, memory-efficient inference and serving of large language models. Gain practical insights into integrating these powerful tools to enhance AI solution deployment efficiency and drive technological innovation in the field of artificial intelligence.
Syllabus
Efficient LLM Deployment: A Unified Approach with Ray, VLLM, and Kubernetes - Lily (Xiaoxuan) Liu
Taught by
CNCF [Cloud Native Computing Foundation]