Start speaking a new language. It’s just 3 weeks away.
Learn AI, Data Science & Business — Earn Certificates That Get You Hired
Overview
Google, IBM & Meta Certificates — All 10,000+ Courses at 40% Off
One annual plan covers every course and certificate on Coursera. 40% off for a limited time.
Get Full Access
Explore an in-depth analysis of a groundbreaking paper on end-to-end adversarial text-to-speech synthesis in this 41-minute video lecture. Delve into the challenges of traditional multi-stage TTS pipelines and discover how this innovative approach tackles the alignment problem using an advanced alignment module. Learn about the adversarial training technique, the architectures of the discriminator and generator, and the novel use of dynamic time warping for capturing temporal variations in generated audio. Gain insights into the spectrogram prediction loss and how this method achieves high-quality speech synthesis comparable to state-of-the-art models, all while operating directly on character or phoneme input sequences.
Syllabus
- Intro & Overview
- Problems with Text-to-Speech
- Adversarial Training
- End-to-End Training
- Discriminator Architecture
- Generator Architecture
- The Alignment Problem
- Aligner Architecture
- Spectrogram Prediction Loss
- Dynamic Time Warping
- Conclusion
Taught by
Yannic Kilcher