TinyDenoiser: RNN-based Speech Enhancement on Multi-Core MCU with Mixed FP16-INT8 Post-Training Quantization

Learn about implementing RNN-based Speech Enhancement algorithms on multi-core microcontrollers in this technical conference talk. Explore an optimized methodology for deploying TinyDenoiser models on the GAP9 MCU platform, featuring 1+9 RISC-V cores with vector INT8 and FP16 arithmetic support. Discover innovative software pipelining techniques that interleave parallel computation of LSTM/GRU units with memory transfers, and understand a novel FP16-INT8 Mixed-Precision Post-Training Quantization scheme that maintains accuracy while reducing computational overhead. Examine experimental results showing 4× speedup compared to FP16 baselines and 10× better energy efficiency than single-core MCU solutions. Delve into key topics including speech enhancement fundamentals, RNN architectures, hardware mapping strategies, optimization techniques like double buffering and tensor promotion, and real-world performance metrics on target hardware.

Syllabus

Intro
Speech Enhancement (or Denoising)
RNN for Speech Enhancement
RISC-V MultiCore MCU Platform (GAP9)
RNN Mapping on HW
Optimizations: Double Buffering
Optimizations: Tensor Promotion
Post-Training Quantization
Latency & Power on target HW/SW