Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Target Sound Extraction with Language-oriented Audio Diffusion Transformer

Center for Language & Speech Processing(CLSP), JHU via YouTube

Overview

Coursera Spring Sale
40% Off Coursera Plus Annual!
Grab it
Explore a 31-minute research presentation from Johns Hopkins University's Center for Language & Speech Processing that introduces SoloAudio, a groundbreaking diffusion-based generative model for target sound extraction. Learn how this innovative system utilizes a skip-connected Transformer architecture operating on latent features, replacing traditional U-Net backbones. Discover the model's integration with CLAP for both audio and language-oriented sound extraction, and understand how it leverages synthetic audio from text-to-audio models during training. Examine SoloAudio's impressive capabilities in generalizing to out-of-domain data, handling novel sound events, and performing zero-shot and few-shot learning tasks.

Syllabus

Solo Audio: Target Sound Extraction with Language-oriented Audio Diffusion Transformer -- Helin Wang

Taught by

Center for Language & Speech Processing(CLSP), JHU

Reviews

Start your review of Target Sound Extraction with Language-oriented Audio Diffusion Transformer

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.