TinyDenoiser: RNN-based Speech Enhancement on Multi-Core MCU with Mixed FP16-INT8 Post-Training Quantization
EDGE AI FOUNDATION via YouTube
Launch a New Career with Certificates from Google, IBM & Microsoft
Live Online Classes in Design, Coding & AI — Small Classes, Free Retakes
Overview
Google, IBM & Meta Certificates — All 10,000+ Courses at 40% Off
One annual plan covers every course and certificate on Coursera. 40% off for a limited time.
Get Full Access
Learn about implementing RNN-based Speech Enhancement algorithms on multi-core microcontrollers in this technical conference talk. Explore an optimized methodology for deploying TinyDenoiser models on the GAP9 MCU platform, featuring 1+9 RISC-V cores with vector INT8 and FP16 arithmetic support. Discover innovative software pipelining techniques that interleave parallel computation of LSTM/GRU units with memory transfers, and understand a novel FP16-INT8 Mixed-Precision Post-Training Quantization scheme that maintains accuracy while reducing computational overhead. Examine experimental results showing 4× speedup compared to FP16 baselines and 10× better energy efficiency than single-core MCU solutions. Delve into key topics including speech enhancement fundamentals, RNN architectures, hardware mapping strategies, optimization techniques like double buffering and tensor promotion, and real-world performance metrics on target hardware.
Syllabus
Intro
Speech Enhancement (or Denoising)
RNN for Speech Enhancement
RISC-V MultiCore MCU Platform (GAP9)
RNN Mapping on HW
Optimizations: Double Buffering
Optimizations: Tensor Promotion
Post-Training Quantization
Latency & Power on target HW/SW
Taught by
EDGE AI FOUNDATION