Speaker Diarization - From Modular to End-to-End Systems - Day 3 Morning

Explore the fundamentals and advanced techniques of speaker diarization in this comprehensive lecture delivered by Federico Landini from Deepgram at JSALT 2025. Learn about the essential task of determining "who spoke when" in audio recordings, starting with traditional modular systems based on clustering approaches and progressing to modern end-to-end systems where single neural networks process audio and generate direct outputs. Discover the evolution from classical methods to cutting-edge neural models including VBx, EEND (End-to-End Neural Diarization), and DiaPer systems. Gain insights from an expert who has contributed significantly to both modular and end-to-end diarization approaches, led successful teams in DIHARD and VoxSRC challenges, and has extensive industry experience from internships at major tech companies including Meta, Facebook, Apple, and Microsoft. Understand the practical applications and implementation strategies for speaker diarization systems, with emphasis on open-source recipes and models that advance the field of speech processing and audio analysis.