Speaker Diarization - From Modular to End-to-End Systems - Day 3 Morning
Center for Language & Speech Processing(CLSP), JHU via YouTube
Google, IBM & Microsoft Certificates — All in One Plan
MIT Sloan: Lead AI Adoption Across Your Organization — Not Just Pilot It
Overview
Google, IBM & Meta Certificates — All 10,000+ Courses at 40% Off
One annual plan covers every course and certificate on Coursera. 40% off for a limited time.
Get Full Access
Explore the fundamentals and advanced techniques of speaker diarization in this comprehensive lecture delivered by Federico Landini from Deepgram at JSALT 2025. Learn about the essential task of determining "who spoke when" in audio recordings, starting with traditional modular systems based on clustering approaches and progressing to modern end-to-end systems where single neural networks process audio and generate direct outputs. Discover the evolution from classical methods to cutting-edge neural models including VBx, EEND (End-to-End Neural Diarization), and DiaPer systems. Gain insights from an expert who has contributed significantly to both modular and end-to-end diarization approaches, led successful teams in DIHARD and VoxSRC challenges, and has extensive industry experience from internships at major tech companies including Meta, Facebook, Apple, and Microsoft. Understand the practical applications and implementation strategies for speaker diarization systems, with emphasis on open-source recipes and models that advance the field of speech processing and audio analysis.
Syllabus
[camera] Day 3 morning - JSALT 2025 - Landini: Speaker Diarization
Taught by
Center for Language & Speech Processing(CLSP), JHU