Democratizing Voice Interfaces - Sparse AI and Low-Code Tools for Edge Deployment

Learn to build practical voice AI systems that operate efficiently on edge devices through this 13-minute conference talk. Explore the fundamental trade-offs of cloud-based speech systems including network latency, privacy concerns, and computational costs, then discover how to implement a hybrid architecture where devices handle fast, predictable tasks while leveraging cloud resources for broader capabilities when needed. Master sparsity techniques to compress large speech models while maintaining accuracy, using production-tested thresholds for true positive rates and minimal false activations. Understand data augmentation strategies and confusable-phrase training methods to strengthen models against room acoustics, accents, and similar-sounding commands, reducing false wake incidents and improving user trust. Discover how synthetic data from large cloud models can train compact on-device models for robustness without extensive data collection requirements. Examine the integration of AI with traditional digital signal processing on edge processors, combining noise suppression, beamforming, and voice activity detection to deliver cleaner signals to recognizers and enhance cloud performance when offloading. Apply sparsity-driven Pareto curves to scale this approach, selecting optimal models based on specific power and memory constraints for different products, enabling instant responses with privacy-first design principles for real-world voice interface deployment.