Multi-Modal Retrieval For Large Language Model Based Speech RecognitionDownload PDF

Anonymous

16 Feb 2024ACL ARR 2024 February Blind SubmissionReaders: Everyone
Abstract: Retrieval is a widely adopted approach for improving language models leveraging external information. As the field moves towards multi-modal large language models, it is important to extend the pure text based methods to incorporate other modalities in retrieval as well for applications across the wide spectrum of machine learning tasks and data types. In this work, we propose multi-modal retrieval with two approaches: kNN-LM and cross-attention techniques. We demonstrate the effectiveness of our retrieval approaches empirically by applying them to automatic speech recognition tasks with access to external information. Under this setting, we show that speech-based multi-modal retrieval outperforms text based retrieval, and yields up to $~50\,\%$ improvement in word error rate over the multi-modal language model baseline. Furthermore, we achieve state-of-the-art recognition results on the Spoken-Squad question answering dataset.
Paper Type: long
Research Area: Speech recognition, text-to-speech and spoken language understanding
Contribution Types: NLP engineering experiment
Languages Studied: English
Preprint Status: There is no non-anonymous preprint and we do not intend to release one.
A1: yes
A1 Elaboration For Yes Or No: 6
A2: n/a
A3: yes
A3 Elaboration For Yes Or No: 1
B: yes
B1: yes
B1 Elaboration For Yes Or No: 3
B2: n/a
B3: yes
B3 Elaboration For Yes Or No: 3
B4: yes
B4 Elaboration For Yes Or No: 3
B5: yes
B5 Elaboration For Yes Or No: 3
B6: yes
B6 Elaboration For Yes Or No: 3
C: yes
C1: yes
C1 Elaboration For Yes Or No: 3
C2: yes
C2 Elaboration For Yes Or No: 3
C3: yes
C3 Elaboration For Yes Or No: 4
C4: yes
C4 Elaboration For Yes Or No: 3
D: no
D1: n/a
D2: n/a
D3: n/a
D4: n/a
D5: n/a
E: no
E1: n/a
0 Replies

Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview