Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Massachusetts Institute of Technology

Smaller, Stronger, and Duration-Scalable Audio Learners

Massachusetts Institute of Technology via YouTube

Overview

Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
In this 31-minute conference talk from MIT, postdoctoral researcher Saurabhchand Bhati presents innovative research on state-space models (SSMs) for audio processing. Learn about the Knowledge Distilled Audio SSM (DASS), a breakthrough model that outperforms Transformers on AudioSet with a smaller footprint, achieving an mAP of 48.9 while reducing model size by one-third. Discover how DASS overcomes traditional SSM limitations in short audio tagging tasks while maintaining exceptional performance on long-duration audio through the Audio Needle In A Haystack test. Bhati, whose research focuses on unsupervised spoken term discovery, representation learning, and multimodal learning, demonstrates how these models can effectively identify sound events in hour-long recordings where Transformer models fail beyond 50 seconds.

Syllabus

Saurabhchand Bhati - Smaller, Stronger, and Duration-Scalable Audio Learners

Taught by

MIT Embodied Intelligence

Reviews

Start your review of Smaller, Stronger, and Duration-Scalable Audio Learners

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.