Running Large Language Models on AMD Strix Halo AI Ryzen MAX+ 395 - GLM 4.5-Air-106B and Qwen3-235B Tutorial

Learn to run large language models like Qwen3-235B and GLM 4.5-Air-106B on AMD Ryzen AI MAX "Strix Halo" systems in this comprehensive 27-minute tutorial video. Discover how to leverage up to 128GB of unified system memory on Linux-based AMD Strix Halo systems, with detailed demonstrations using the HP Z2 Mini G1a workstation that also apply to other Strix Halo-based systems like the GMKtec EVO X2 and Framework Desktop. Master proper kernel configuration and unified memory tuning techniques essential for optimal performance. Explore the setup process for running these massive language models locally, including memory requirements based on context size, Vulkan implementation options (AMDVLK/RADV/ROCm), AMD ROCm configuration, and Fedora-based toolbox environments. Access comprehensive benchmark results, performance analysis, and practical scripts through the accompanying GitHub repository containing toolboxes, benchmarks, and VRAM estimation tools. Gain insights into post-training quantization techniques and reverse-engineering GGUF formats while learning to optimize large language model deployment on AMD's latest AI-focused hardware architecture.

Syllabus

00:00 - Introduction to AMD "Strix Halo" Ryzen AI MAX 395
01:39 - TL;DR
04:39 - Running LLMs Locally
06:46 - AMD "Strix Halo" Mini PCs
09:36 - HP Z2 G1a Mini Workstation
11:59 - My Setup Memory + Llama.cpp Builds
14:00 - Vulkan AMDVLK/RADV/ROCm
15:33 - AMD ROCm
17:08 - Fedora-Based Toolboxes
17:32 - Benchmark Results
20:50 - Memory Requirements Context Size
23:57 - Credits
24:58 - Conclusion