35% Off Finance Skills That Get You Hired - Code CFI35
50% OFF: In-Depth AI & Machine Learning Course
Overview
Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Explore an in-depth analysis of a groundbreaking paper on end-to-end adversarial text-to-speech synthesis in this 41-minute video lecture. Delve into the challenges of traditional multi-stage TTS pipelines and discover how this innovative approach tackles the alignment problem using an advanced alignment module. Learn about the adversarial training technique, the architectures of the discriminator and generator, and the novel use of dynamic time warping for capturing temporal variations in generated audio. Gain insights into the spectrogram prediction loss and how this method achieves high-quality speech synthesis comparable to state-of-the-art models, all while operating directly on character or phoneme input sequences.
Syllabus
- Intro & Overview
- Problems with Text-to-Speech
- Adversarial Training
- End-to-End Training
- Discriminator Architecture
- Generator Architecture
- The Alignment Problem
- Aligner Architecture
- Spectrogram Prediction Loss
- Dynamic Time Warping
- Conclusion
Taught by
Yannic Kilcher