Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Text-to-Speech and Voice Cloning Course - How Machines Process Text

Valerio Velardo - The Sound of AI via YouTube

Overview

Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Explore how text-to-speech systems transform raw text into speech-ready phonemes in this 41-minute lecture from The Monster Text-to-Speech and Voice Cloning Course. Master the critical text processing pipeline that enables AI to read before it can speak, covering text normalization techniques for standardizing numbers, abbreviations, and complex formatting. Dive deep into Grapheme-to-Phoneme (G2P) conversion methods, comparing rule-based approaches using dictionaries and fallback rules against modern learned approaches with sequence-to-sequence models. Understand the challenging homograph problem where identical words require different pronunciations based on context, and discover how ambiguity resolution techniques address these complexities. Learn to work with essential tools including CMUDict, Phonemizer, DeepPhonemizer, and g2p_en for practical text processing implementation. Examine how modern end-to-end TTS systems have evolved to learn text processing implicitly, reducing the need for explicit preprocessing steps. Access comprehensive course materials through the GitHub repository and engage with fellow learners in The Sound of AI Slack community's dedicated TTS course channel.

Syllabus

0:00 Intro
0:12 TTS pipeline
2:20 Text processing
5:00 Normalization
7:31 Normalization tools
9:55 Grapheme-to-phoneme
14:36 Rule-based G2P
19:20 Learned G2P
24:07 Ambiguity problem
33:20 Modern end-to-end TTS
35:38 G2P tools
38:17 Takeaways

Taught by

Valerio Velardo - The Sound of AI

Reviews

Start your review of Text-to-Speech and Voice Cloning Course - How Machines Process Text

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.