Get 20% off all career paths from fullstack to AI
Become an AI & ML Engineer with Cal Poly EPaCE — IBM-Certified Training
Overview
AI, Data Science & Cloud Certificates from Google, IBM & Meta — 40% Off
One plan covers every Professional Certificate on Coursera. 40% off Coursera Plus Annual.
Unlock All Certificates
Learn about DecDEC, a novel inference scheme that enhances the quality of low-bit quantized Large Language Models while maintaining the memory and latency benefits of quantization. Discover how this systems approach stores residual matrices in CPU memory and dynamically fetches corrections for salient channels identified by activation outliers, enabling adaptive error compensation that responds to the dynamic nature of activation distributions. Explore the technical implementation that achieves significant perplexity improvements, such as reducing a 3-bit Llama-3-8B-Instruct model's perplexity from 10.15 to 9.12 while adding minimal GPU memory overhead and only 1.7% inference slowdown, making it particularly valuable for on-device deployment scenarios with limited hardware resources.
Syllabus
OSDI '25 - DecDEC: A Systems Approach to Advancing Low-Bit LLM Quantization
Taught by
USENIX