Understanding 4-bit Quantization and QLoRA - Memory Efficient Fine-tuning of LLMs
Discover AI via YouTube
Save 40% on 3 months of Coursera Plus
Master Windows Internals - Kernel Programming, Debugging & Architecture
Overview
Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Learn about QLoRA 4-bit quantization for memory-efficient fine-tuning of Large Language Models through a detailed 42-minute video tutorial that covers both theoretical concepts and practical implementation. Explore Parameter Efficient Fine-Tuning (PEFT) methods, with a specific focus on how 4-bit quantization works in QLoRA. Follow along with a hands-on demonstration using Google Colab to fine-tune a FALCON 7B model using QLoRA 4-bit quantization and Transformer Reinforcement Learning (TRL). Gain insights into Huggingface Accelerate's support for 4-bit QLoRA LLM models and access practical code examples for implementation. Build upon foundational knowledge of LoRA and other PEFT methods while mastering advanced techniques for optimizing large language models.
Syllabus
Understanding 4bit Quantization: QLoRA explained (w/ Colab)
Taught by
Discover AI