Overview
Syllabus
0:00 Faster training with multiple GPUs
0:39 Video Overview
1:24 Data parallel versus Pipeline Parallel versus Fully Sharded Data Parallel
6:38 Downloading a jupyter notebook as a python script for multi-gpu, e.g. an unsloth notebook
7:44 Unsloth vs Transformers for multi-gpu
8:13 Modifying a fine-tuning script for distributed data parallel
9:03 Starting up a GPU in one-click for fine-tuning
10:27 Converting a jupyter notebook to a python script
11:30 Installation notes for unsloth and tensorboard, and uv
13:32 Script modifications required for DDP
18:50 Training script run-through, for LoRA
22:46 Setting gradient accumulation steps
24:07 Dataset loading
26:22 Setting up the run name and training parameters
29:30 Running without multi-gpu single gpu check
35:47 Running with multiple GPUs using accelerate config btw torch run can result in run hangs
41:02 Sanity check of running with accelerate and a single gpu
44:48 Open at time of recording issues with loss reporting and using unsloth with batch size larger than one
53:11 Conclusion and shout-outs to spr1nter and rakshith
Taught by
Trelis Research