Building Awesome Speech-to-Text Transformers from Scratch - One Line of PyTorch at a Time
Neural Breakdown with AVB via YouTube
Overview
Syllabus
0:00 - Intro
0:36 - How Audio datasets look like
4:30 - Tokenizing text
9:34 - Data Preprocessing
11:38 - MFCCs, and Encoder-Decoder networks
14:20 - Network Architecture
17:59 - Coding the Convolutional Block
26:40 - Coding attention and Transformers
30:20 - Residual Vector Quantizers
32:57 - Coding RVQs
37:44 - Optimizing RVQs
43:50 - Putting it together
48:50 - Connectionist-Temporal Classification CTC Loss
50:53 - Training!
Taught by
Neural Breakdown with AVB