LoRA Fine-tuning Tiny LLMs as Expert Agents

Learn how to fine-tune tiny language models as expert agents using Low-Rank Adaptation (LoRA) in this 53-minute tutorial by James Briggs. Discover why small LLMs struggle with function calls and how to overcome this limitation by fine-tuning the 1B parameter Llama 3.2 model with Salesforce's xLAM dataset using NVIDIA's NeMo Microservices. Follow along with the complete workflow from deployment of NeMo Microservices, dataset preparation, train-validation-test splitting, setting up NeMo Data Store and Entity Store, to LoRA training with NeMo Customizer, deploying NIMs, and implementing chat completion. The tutorial includes links to GitHub repositories with code examples and is organized in a clear timeline structure for easy navigation through different sections of the process.

Syllabus

00:00 LoRA Fine-tuning Agents
01:34 NeMo Microservices
03:19 NeMo Deployment
07:49 Deploying NeMo Microservices
16:54 xLAM Dataset Preparation
26:49 Train Validation Test Split
28:59 NeMo Data Store and Entity Store
34:14 LoRA Training with NeMo Customizer
42:03 Deploying NIMs
47:10 Chat Completion with NVIDIA NIMs
49:47 NVIDIA NeMo Microservices