AudioGen- Textually Guided Audio Generation - Paper Explained
Aleksa Gordić - The AI Epiphany via YouTube
MIT Sloan: Lead AI Adoption Across Your Organization — Not Just Pilot It
Power BI Fundamentals - Create visualizations and dashboards from scratch
Overview
Google, IBM & Meta Certificates — All 10,000+ Courses at 40% Off
One annual plan covers every course and certificate on Coursera. 40% off for a limited time.
Get Full Access
Dive deep into the world of text-guided audio synthesis with this comprehensive video explanation of the "AudioGen: Textually Guided Audio Generation" paper. Explore the challenges of text-to-audio conversion, compare AudioGen with VQ-GAN and SoundStream, and gain insights into audio representation, LSTM networks, and complex-valued STFTs. Learn about audio language modeling, multi-stream audio inputs, data augmentation techniques, and examine the impressive results of this innovative approach to audio generation.
Syllabus
Intro
Why is text-to-audio hard?
Comparison with VQ-GAN
Comparison with SoundStream
AudioGen overview
Deep dive: audio representation, LSTM
Losses explained
Complex-valued STFTs
Audio Language Modeling
Multi-stream audio inputs
Data and augmentations
Results
Outro
Taught by
Aleksa Gordić - The AI Epiphany