Overview
Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Learn to train Sesame's CSM-1B text-to-speech model on a voice extracted from a YouTube video in this comprehensive workshop. Master the complete pipeline from data preparation through fine-tuning to evaluation, gaining hands-on experience with modern text-to-speech technology. Discover techniques for processing audio data, preparing training datasets, and implementing fine-tuning strategies to create custom voice models. Explore evaluation methods to assess model performance and quality of generated speech. Build practical skills in voice cloning and speech synthesis while working with real-world data sources, making this workshop ideal for AI engineers, machine learning practitioners, and developers interested in audio processing and generative AI applications.
Syllabus
Text-to-Speech Data Preparation and Fine-tuning Workshop - Ronan McGovern
Taught by
AI Engineer