Dual AMD Radeon 9700 AI PRO - Building a 64GB LLM/AI Server with Llama.cpp

Learn to build a high-performance 64GB VRAM AI workstation using dual AMD Radeon AI PRO 9700 GPUs in this comprehensive 51-minute technical tutorial. Explore the complete hardware requirements for a dual-GPU setup, including critical considerations for airflow management, PCIe lane splitting, and power supply specifications. Master the software configuration process on Linux using ROCm/Vulkan drivers and Llama.cpp for local large language model inference. Discover detailed component specifications including AMD Ryzen 9 9900X3D CPU, ASRock X870E Taichi motherboard, 64GB DDR5 RAM, and Corsair 1200W power supply housed in a high-airflow Fractal Design Torrent case. Follow step-by-step instructions for OS and ROCm installation, GPU monitoring tool setup, and Llama.cpp toolbox configuration. Compare Vulkan versus ROCm performance characteristics and learn to download models from Hugging Face for testing. Execute practical demonstrations of LLM inference using llama-cli and llama-server, analyze benchmark results with llama-bench, and evaluate single versus dual GPU performance scaling. Examine the workstation's capability to run large language models across both GPUs simultaneously, providing a cost-effective alternative to NVIDIA RTX 5080 solutions for AI development and research applications.

Syllabus

00:00 Dual GPU Workstation Intro
02:23 Is this a Good GPU for AI?
05:30 Hardware Component List
06:54 Managing Airflow and Heat
07:49 Understanding PCIe Lane Splitting
10:31 Power Supply Requirements
11:35 OS and ROCm Installation
13:54 Installing GPU Monitoring Tools
16:08 Llama.cpp Toolboxes Overview
19:36 Vulkan vs ROCm
22:56 Setting Up the Toolboxes
25:56 Downloading Models from HF
29:29 Running LLMs via llama-cli and llama-server
34:37 Running llama-bench
37:26 Single vs Dual GPU Performance
42:21 Running Large Models on Dual GPUs
46:56 Summary and Future Plans