Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Text-to-Speech and Voice Cloning Course - Neural TTS Revolution

Valerio Velardo - The Sound of AI via YouTube

Overview

Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Explore how deep learning revolutionized text-to-speech technology in this comprehensive 41-minute video lecture that traces the neural transformation beginning with the 2016 breakthroughs of WaveNet and Tacotron. Discover how end-to-end learning, learned representations, and neural vocoders replaced traditional manual feature design to create synthetic voices that sound natural, expressive, and human-like. Learn about the fundamental shift from concatenative synthesis to neural approaches, understanding the 2-stage TTS pipeline that combines acoustic models with vocoders, and how mel spectrograms serve as the bridge between text and audio. Examine key neural vocoder architectures including WaveNet, WaveGlow, and HiFi-GAN, while exploring sequence-to-sequence models with attention mechanisms that enabled more sophisticated speech synthesis. Delve into parallel TTS architectures like FastSpeech and GlowTTS that improved efficiency and quality, and understand how these neural advances paved the way for voice cloning capabilities and expressive speech generation. Investigate modern developments in codec-based generation systems such as VALL-E, AudioLM, and SPEAR-TTS, while considering the ongoing challenges and future directions in neural text-to-speech research. This lecture forms part of a comprehensive course series designed to provide deep understanding of state-of-the-art concepts in speech synthesis and voice cloning technology.

Syllabus

Intro
The deep learning breakthrough
Core neural innovations
2-stage neural pipeline
WaveNet
Tacotron
What makes neural TTS work
Parallel neural generation
Unlocking voice cloning
Modern TTS architectures
End-to-end
Codec-based voice cloning
Open challenges
Takeaways

Taught by

Valerio Velardo - The Sound of AI

Reviews

Start your review of Text-to-Speech and Voice Cloning Course - Neural TTS Revolution

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.