Unlocking Local LLMs with Quantization
AI, Data Science & Business Certificates from Google, IBM & Microsoft
Most AI Pilots Fail to Scale. MIT Sloan Teaches You Why — and How to Fix It
Overview
Google, IBM & Meta Certificates — All 10,000+ Courses at 40% Off
One annual plan covers every course and certificate on Coursera. 40% off for a limited time.
Get Full Access
Learn about quantization's evolution and its impact on local Large Language Models in this 40-minute conference talk from Hugging Face's Marc Sun. Explore the journey of quantization through influential papers like QLoRA and GPTQ, and discover its practical applications across different stages of model development. Gain insights into pre-training a 1.58-bit model, implementing fine-tuning techniques with PEFT + QLoRA, and optimizing inference performance using torch.compile or custom kernels. Understand how the open-source community is making quantized models more accessible through transformers and GGUF models from llama.cpp, enabling broader adoption of local LLM implementations.
Syllabus
Unlocking Local LLMs with Quantization - Marc Sun, Hugging Face
Taught by
Linux Foundation