AI, Data Science & Cloud Certificates from Google, IBM & Meta
Lead AI-Native Products with Microsoft's Agentic AI Program
Overview
Google, IBM & Meta Certificates — All 10,000+ Courses at 40% Off
One annual plan covers every course and certificate on Coursera. 40% off for a limited time.
Get Full Access
Explore a conference talk that addresses model startup latency bottlenecks in modern inference workloads by presenting an innovative approach to accelerate model boot times through Triton kernel caching with OCI container images. Learn how to overcome the persistent challenge of Just In Time (JIT) compilation delays when using custom kernels written in Triton by implementing a novel solution that wraps Triton kernel caches in OCI container images. Discover through a working prototype demonstration how Triton-generated LLVM Kernels can be packaged into reusable, portable container layers that create "hot start" containers deployable directly to Kubernetes. Understand how this approach bypasses costly JIT compilation processes and significantly reduces model startup time, making it particularly valuable for ML infrastructure builders, OSS compiler developers, and professionals deploying models at scale. Gain practical techniques for optimizing cold starts in Models using Triton-lang, with insights applicable to modern containerized deployment environments and Kubernetes-based ML workflows.
Syllabus
From Cold Start To Warp Speed: Triton Kernel Caching With OCI Container Images - Maryam Tahhan
Taught by
Linux Foundation