Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

End-to-End Adversarial Text-to-Speech - Paper Explained

Yannic Kilcher via YouTube

Overview

Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Explore an in-depth analysis of a groundbreaking paper on end-to-end adversarial text-to-speech synthesis in this 41-minute video lecture. Delve into the challenges of traditional multi-stage TTS pipelines and discover how this innovative approach tackles the alignment problem using an advanced alignment module. Learn about the adversarial training technique, the architectures of the discriminator and generator, and the novel use of dynamic time warping for capturing temporal variations in generated audio. Gain insights into the spectrogram prediction loss and how this method achieves high-quality speech synthesis comparable to state-of-the-art models, all while operating directly on character or phoneme input sequences.

Syllabus

- Intro & Overview
- Problems with Text-to-Speech
- Adversarial Training
- End-to-End Training
- Discriminator Architecture
- Generator Architecture
- The Alignment Problem
- Aligner Architecture
- Spectrogram Prediction Loss
- Dynamic Time Warping
- Conclusion

Taught by

Yannic Kilcher

Reviews

Start your review of End-to-End Adversarial Text-to-Speech - Paper Explained

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.