CDG-Based Language Models for Large Vocabulary Continuous Speech Recognition

Explore the development of Constraint Dependency Grammar (CDG) based language models for large vocabulary continuous speech recognition systems in this comprehensive research presentation. Learn about CDG's unique ability to represent properties across diverse languages and its capacity for word-level lexicalization with rich lexical features for modeling subcategorization and wh-movement without parameter space explosion. Discover two distinct language model types: an almost-parsing model utilizing SuperARV data structures that integrate words, lexical features, and syntactic constraints, and a full parser-based model incorporating complete parse information through modifiee links. Examine the insights gained from initial CDG grammar induction experiments that significantly enhanced model quality. Understand the evaluation results showing the almost-parsing model's substantial error rate reduction in LVCSR tasks with lower time complexity compared to full parser-based approaches, while the full CDG parser-based model demonstrates performance comparable to or exceeding state-of-the-art parser-based language models on the DARPA Wall Street Journal CSR task.