DOC-RAG: ASR Language Model Personalization with Domain-Distributed Co-occurrence Retrieval Augmentation

Published: 01 Jan 2024, Last Modified: 14 Aug 2025LREC/COLING 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: We propose DOC-RAG - Domain-distributed Co-occurrence Retrieval Augmentation for ASR language model personalization aiming to improve the automatic speech recognition of rare word patterns in unseen domains. Our approach involves contrastively training a document retrieval module to rank external knowledge domains based on their semantic similarity with respect to the input query. We further use n-gram co-occurrence distribution to recognize rare word patterns associated with specific domains. We aggregate the next word probability distribution based on the relative importance of different domains. Extensive experiments on three user-specific speech-to-text tasks for meetings, TED talks, and financial earnings calls show that DOC-RAG significantly outperforms strong baselines with an 8-15% improvement in terms of perplexity and a 4-7% reduction in terms of Word Error Rates in various settings.
Loading