Google AI Professional Certificate - Learn AI Skills That Get You Hired
Get 35% Off CFI Certifications - Code CFI35
Overview
Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Learn about DecDEC, a novel inference scheme that enhances the quality of low-bit quantized Large Language Models while maintaining the memory and latency benefits of quantization. Discover how this systems approach stores residual matrices in CPU memory and dynamically fetches corrections for salient channels identified by activation outliers, enabling adaptive error compensation that responds to the dynamic nature of activation distributions. Explore the technical implementation that achieves significant perplexity improvements, such as reducing a 3-bit Llama-3-8B-Instruct model's perplexity from 10.15 to 9.12 while adding minimal GPU memory overhead and only 1.7% inference slowdown, making it particularly valuable for on-device deployment scenarios with limited hardware resources.
Syllabus
OSDI '25 - DecDEC: A Systems Approach to Advancing Low-Bit LLM Quantization
Taught by
USENIX