Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

DecDEC - A Systems Approach to Advancing Low-Bit LLM Quantization

USENIX via YouTube

Overview

Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Learn about DecDEC, a novel inference scheme that enhances the quality of low-bit quantized Large Language Models while maintaining the memory and latency benefits of quantization. Discover how this systems approach stores residual matrices in CPU memory and dynamically fetches corrections for salient channels identified by activation outliers, enabling adaptive error compensation that responds to the dynamic nature of activation distributions. Explore the technical implementation that achieves significant perplexity improvements, such as reducing a 3-bit Llama-3-8B-Instruct model's perplexity from 10.15 to 9.12 while adding minimal GPU memory overhead and only 1.7% inference slowdown, making it particularly valuable for on-device deployment scenarios with limited hardware resources.

Syllabus

OSDI '25 - DecDEC: A Systems Approach to Advancing Low-Bit LLM Quantization

Taught by

USENIX

Reviews

Start your review of DecDEC - A Systems Approach to Advancing Low-Bit LLM Quantization

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.