SALMONN-omni: A Speech Understanding and Generation LLM in a Codec-free Full-duplex Framework

26 Sept 2024 (modified: 24 Nov 2024)ICLR 2025 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Large speech text model, Full-duplex model
Abstract:

Speech large language models (LLMs) offer a unified approach to handling various speech-processing tasks using a single autoregressive model built on discrete speech and audio codecs. Unlike traditional pipeline-based systems, which involve separate components for speech recognition, understanding, and generation, end-to-end speech LLMs can capture both verbal and non-verbal information, such as paralinguistic and speaker characteristics. This enables full-duplex capabilities, allowing the system to listen and speak simultaneously with low latency, making it ideal for conversational AI. In this paper, we introduce a novel codec-free, full-duplex framework for speech understanding and generation, and present {SALMONN-omni}, an instance of this speech LLM. SALMONN-omni can listen to its own generated speech and background sounds while speaking. To align the frame rate gap between text and audio, we propose a novel \textit{thinking} step, ensuring high performance on pre-trained tasks. Using a two-stage \textit{understand then generate} training approach, SALMONN-omni effectively addresses a variety of streaming speech tasks, including speech recognition, synthesis, enhancement, dereverberation, target speaker extraction, and spoken question answering.

Primary Area: applications to computer vision, audio, language, and other modalities
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 7455
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview