Launch a New Career with Certificates from Google, IBM & Microsoft
UC San Diego Product Management Certificate — AI-Powered PM Training
Overview
Google, IBM & Meta Certificates — All 10,000+ Courses at 40% Off
One annual plan covers every course and certificate on Coursera. 40% off for a limited time.
Get Full Access
Learn how to train a language model from scratch in this comprehensive 3.5-hour tutorial guided by Imad Saddik. Master the complete process using Moroccan Darija as an example, covering essential topics including loading text data, training a tokenizer with Byte Pair Encoding, understanding Transformer architecture, pre-training models, creating supervised fine-tuning datasets, and building your own AI assistant. Access all resources including code, notebooks, datasets, and tokenizers through the provided GitHub repositories and Hugging Face links. The tutorial progresses through structured sections from basic concepts to advanced scaling techniques, making it accessible for beginners while providing practical implementation experience.
Syllabus
0:00:00 About the Course
0:03:03 Introduction
0:07:24 Training Data
0:15:33 Tokenization
0:29:00 The Transformer Architecture
0:52:21 Pre-training
1:24:46 Fine-tuning Dataset
1:33:05 Instruction Fine-tuning
2:06:17 Fine-tuning with LoRA
2:20:39 Let's Scale Everything
3:09:40 Bonus
3:27:10 Conclusion
Taught by
freeCodeCamp.org