Non-Stationary Multi-Stream Processing

Explore advanced multi-stream automatic speech recognition techniques in this 58-minute lecture delivered by Herve Bourlard from the Swiss Federal Institute of Technology at Lausanne. Delve into the extension of standard hidden Markov model (HMM) approaches through multi-stream processing, where speech signals are analyzed by independent "experts" that each focus on different signal characteristics. Learn how stream likelihoods and posteriors are combined at temporal stages to produce global recognition outputs, with particular emphasis on the most successful approach of integrating stream likelihoods across all possible stream combinations. Discover the mathematical models underlying this framework and examine their relationships with psycho-acoustic evidence. Investigate subband-based speech recognition as a specific application of multi-stream processing. Understand the innovative HMM2 approach, where emission probabilities are estimated through state-specific feature-based HMMs that merge stream information while modeling potential correlations. Analyze recognition results achieved in non-stationary noise environments and explore fast adaptation techniques for optimizing limited parameter sets through practical examples.