Completed
0:00 Fine-tuning Text-to-Speech Models with Unsloth
Class Central Classrooms beta
YouTube videos curated by Class Central.
Classroom Contents
Professional Quality Voice Cloning - Open Source vs ElevenLabs
Automatically move to the next video in the Classroom when playback concludes
- 1 0:00 Fine-tuning Text-to-Speech Models with Unsloth
- 2 0:53 Video Overview
- 3 1:47 Video Resources
- 4 2:26 Voice Quality Examples: ElevenLabs vs Open Source
- 5 4:52 The recipe for professional quality voice cloning
- 6 6:23 How do token based speech to text models?
- 7 14:08 Data Preparation and Training Overview
- 8 16:02 Data preparation, cleaning and chunking for voice cloning
- 9 24:05 Audio transcription from uploaded audio
- 10 25:42 Dataset chunking and pushing to HuggingFace Hub
- 11 29:49 Loading Sesame CSM-1B and LoRA adapters full fine-tuning also possible! And in the repo
- 12 34:36 Dataset loading and creating and eval split
- 13 37:42 Training Hyperparameters
- 14 40:08 Running inference on the fine-tuned model, and evaluating
- 15 43:57 LoRA fine-tuning of Orpheus by Canopy Labs - Data loading and is very different!
- 16 50:27 Running inference and Listening to the quality with Orpheus
- 17 53:15 Professional Voice Cloning with Eleven Labs
- 18 56:18 Examining tensorboard logs from the Sesame LoRA fine-tuning
- 19 57:27 Upcoming video on serving Orpheus with vLLM
- 20 58:10 Conclusion