Understanding 4-bit Quantization and QLoRA - Memory Efficient Fine-tuning of LLMs
Discover AI via YouTube
Learn Python with Generative AI - Self Paced Online
Python, Prompt Engineering, Data Science — Build the Skills Employers Want Now
Overview
Google, IBM & Meta Certificates — All 10,000+ Courses at 40% Off
One annual plan covers every course and certificate on Coursera. 40% off for a limited time.
Get Full Access
Learn about QLoRA 4-bit quantization for memory-efficient fine-tuning of Large Language Models through a detailed 42-minute video tutorial that covers both theoretical concepts and practical implementation. Explore Parameter Efficient Fine-Tuning (PEFT) methods, with a specific focus on how 4-bit quantization works in QLoRA. Follow along with a hands-on demonstration using Google Colab to fine-tune a FALCON 7B model using QLoRA 4-bit quantization and Transformer Reinforcement Learning (TRL). Gain insights into Huggingface Accelerate's support for 4-bit QLoRA LLM models and access practical code examples for implementation. Build upon foundational knowledge of LoRA and other PEFT methods while mastering advanced techniques for optimizing large language models.
Syllabus
Understanding 4bit Quantization: QLoRA explained (w/ Colab)
Taught by
Discover AI