Learn Python with Generative AI - Self Paced Online
Get 20% off all career paths from fullstack to AI
Overview
AI, Data Science & Cloud Certificates from Google, IBM & Meta — 40% Off
One plan covers every Professional Certificate on Coursera. 40% off your first 3 months — limited time.
Unlock All Certificates
Discover a comprehensive conference talk that addresses the critical challenge of GPU underutilization in enterprise AI deployments. Learn how to transform costly, siloed GPU infrastructure into a centralized, high-yield AI platform that maximizes return on investment through Kubernetes orchestration. Explore strategies for creating a fungible GPU fabric shared across teams to boost utilization rates, implement intelligent autoscaling to align infrastructure spending with real-time demand, and leverage vLLM with the new llm-d observability framework for distributed inference management. Master performance SLO management including Time to First Token (TTFT) and Tokens Per Second (TPS) metrics while gaining fine-grained control over token economics through tiered service offerings. Understand why Kubernetes serves as the core economic engine for GPU optimization rather than merely a technical orchestrator, and develop practical solutions for the widespread problem of cloud GPUs operating at less than 15% capacity.
Syllabus
Stop Allocating GPUs, Start Delivering Intelligence: An Enterprise... Vincent Caldeira & Daniel Oh
Taught by
Linux Foundation