Finetuning LLMs on Strix Halo - Full, LoRA, and QLoRA on Gemma-3, Qwen-3, and GPT-OSS-20B

Learn to finetune large language models locally using AMD Ryzen AI Max "Strix Halo" hardware through this comprehensive 55-minute tutorial. Discover how to set up and use a pre-configured finetuning toolbox that includes Jupyter notebooks and all necessary dependencies for immediate model training. Master full-parameter finetuning techniques alongside efficient LoRA and QLoRA methods with 8-bit and 4-bit quantization on popular models including Gemma-3 (1B and 12B variants), Qwen-3, and GPT-OSS-20B. Explore essential aspects of dataset preparation, unified memory management specific to Strix Halo architecture, model checkpointing strategies, and exporting trained models for inference deployment. Follow along with hands-on demonstrations of the complete finetuning workflow from initial setup through model inference, including specialized techniques like reasoning-based finetuning using Harmony templates. Access all accompanying notebooks, scripts, and resources to replicate the training processes on your own Framework Desktop system.

Syllabus

00:00 – Intro and goals
04:09 – Strix Halo Finetuning Toolbox Installation
10:49 – Gemma-3 Notebook Introduction
16:57 – Setup, Imports and Dataset
20:36 – Gemma-3-1B Full Parameter Finetuning
26:28 – Loading Finetuned Model for Inference
28:21 – Gemma-3-12B Full Parameter Finetuning
30:00 – LoRA Finetuning
37:30 – LoRA and QLoRA on Gemma-3 8 and 4-bit
43:08 – GPT-OSS 20B Finetuning with LoRA
48:13 – GPT-OSS 20B Finetuning with Reasoning Harmony Template
53:43 – Closing and what’s next