Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Streaming Speech to Text Models - Kyutai vs Whisper

Trelis Research via YouTube

Overview

Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Learn to implement and compare streaming speech-to-text models by exploring Kyutai's real-time transcription capabilities against OpenAI's Whisper in this comprehensive technical tutorial. Discover how to set up and run Kyutai TTS on Mac systems, implement streaming transcription in Jupyter notebooks, and leverage word-level timestamping for precise audio analysis. Master text and audio-assisted transcription techniques while building a high-performance streaming TTS server using Rust for production environments. Compare the architectural differences between Whisper and Kyutai models, understand the theoretical foundations of timestamping in speech recognition, and explore how Kyutai trains on Whisper's timestamped data to achieve superior streaming performance. Gain hands-on experience with both English and French language processing while evaluating streaming capabilities against traditional batch processing methods like Whisper and Voxtral.

Syllabus

0:00 Streaming Speech to Text Demo with Kyutai TTS
0:42 Demo en français
1:05 Video Overview
2:42 Resources & Repo
3:15 Running Kyutai TTS on your Mac
5:15 Run streaming TTS in a notebook
5:58 Word timestamping
8:52 Text and Audio Assisted Transcription
11:46 Fast STREAMING TTS server with Rust
15:27 Streaming vs Whisper TTS vs Voxtral
19:53 Theory of Timestamping
22:55 Whisper vs Kyutai TTS architectures
24:34 How Kyutai is trained with whisper timestamped data
25:50 Wrap up

Taught by

Trelis Research

Reviews

Start your review of Streaming Speech to Text Models - Kyutai vs Whisper

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.