Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Linux Foundation

Arctic Inference - Open-Source LLM Inference Optimizations at Snowflake

Linux Foundation via YouTube

Overview

Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Explore cutting-edge open-source optimizations for large language model inference in this 28-minute conference talk from the Linux Foundation. Discover Arctic Inference, an innovative vLLM plugin developed by Snowflake AI Research that delivers comprehensive performance enhancements for LLM inference workloads. Learn about groundbreaking techniques including SwiftKV, Ulysses, Shift Parallelism, Suffix Decoding, and Arctic Speculator that collectively achieve remarkable performance improvements: up to 3.4x faster time-to-first-token (TTFT), 1.7x higher throughput, and 1.75x faster time-per-output-token (TPOT) compared to existing open-source solutions. Understand the technical innovations behind these optimizations and how the pluggable design enables seamless integration into existing vLLM deployments. Gain insights into practical implementation strategies and discover how these powerful tools can enhance inference efficiency and resource utilization in real-world applications, empowering the open-source community with advanced LLM inference capabilities.

Syllabus

Arctic Inference: Open-Source LLM Inference Optimizations at Snowflake - Aurick Qiao, Snowflake

Taught by

Linux Foundation

Reviews

Start your review of Arctic Inference - Open-Source LLM Inference Optimizations at Snowflake

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.