Keywords: Convex optimization, Spoken dialogue systems, Large language models, ASR
TL;DR: Fast Convex Neural Networks For Bilingual Robust ASR Language Detection
Abstract: Globalization and multiculturalism have produced numerous diverse dialects, such as Singaporean-accented English and regional Mandarin speech. These speech variants remain significantly under-represented even in high-resource language datasets.
Consequently, standard spoken dialogue systems frequently misidentify the user’s input language, compromising response accuracy regardless of downstream language model capability.
To address this, we propose a robust ASR framework capable of handling dialectal variance with minimal computational overhead and lightweight training costs. Our Convex Language Detection (CLD) framework integrates a convex neural network that guarantees global optimality in polynomial time. This is solved efficiently using ADMM in JAX, achieving sub-\num{500}ms inference latency. CLD offers strong convergence guarantees, stability across runs, and reduced sample complexity. As a motivating case study, CLD significantly improves transcription accuracy on bilingual inputs when integrated with Whisper encoders. These results enable more inclusive multilingual interactions and highlight promising directions for convex optimization methods in spoken dialogue systems.
Submission Number: 140
Loading