Overview
Syllabus
0:00 Whisper preparation and fine-tuning with Unsloth
0:40 Resources: Trelis.com/ADVANCED-audio
1:23 One-click GPU and Jupyter Notebook Setup
3:37 Whisper vs Voxtral vs Kyutai
4:48 Installation of Unsloth and Whisper Timestamped
7:52 Using Whisper Large versus Turbo
8:53 Video Overview / Layout - How to prepare data and train
11:33 Audio recording and transcription with whisper timestamped
13:06 Whisper vs Whisper-Timestamped and the motivation for word timestamps
15:34 Creating text/audio segments using word-timestamped transcripts
18:46 Segment time-stamps using whisper not easy to then chunk to less than 30s!!!
19:50 Word time-stamps with Whisper Timestamped
20:56 Automated vs manual transcript cleanup techniques
28:48 Dataset creation from audio and text segments
20:53 Fine-tuning with Unsloth
33:26 Word Error Rate - Teacher Force versus predict_with_generate
36:11 Training hyperparameters and losses / results
37:33 Evaluating base and fine-tuned model performance
39:15 Merging, pushing to hub and preparing for inference see also https://www.youtube.com/watch?v=qXtPPgujufI
40:21 Conclusion
Taught by
Trelis Research