Speaker Diarization - From Modular to End-to-End Systems - Day 3 Morning
Center for Language & Speech Processing(CLSP), JHU via YouTube
Get 35% Off CFI Certifications - Code CFI35
Master Windows Internals - Kernel Programming, Debugging & Architecture
Overview
Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Explore the fundamentals and advanced techniques of speaker diarization in this comprehensive lecture delivered by Federico Landini from Deepgram at JSALT 2025. Learn about the essential task of determining "who spoke when" in audio recordings, starting with traditional modular systems based on clustering approaches and progressing to modern end-to-end systems where single neural networks process audio and generate direct outputs. Discover the evolution from classical methods to cutting-edge neural models including VBx, EEND (End-to-End Neural Diarization), and DiaPer systems. Gain insights from an expert who has contributed significantly to both modular and end-to-end diarization approaches, led successful teams in DIHARD and VoxSRC challenges, and has extensive industry experience from internships at major tech companies including Meta, Facebook, Apple, and Microsoft. Understand the practical applications and implementation strategies for speaker diarization systems, with emphasis on open-source recipes and models that advance the field of speech processing and audio analysis.
Syllabus
[camera] Day 3 morning - JSALT 2025 - Landini: Speaker Diarization
Taught by
Center for Language & Speech Processing(CLSP), JHU