Reinforcement Learning Enhanced Full-Duplex Spoken Dialogue Language Models for Conversational Interactions

Chen Chen; Ke Hu; Chao-Han Huck Yang; Ankita Pasad; Edresson Casanova; Weiqing Wang; Szu-Wei Fu; Jason Li; Zhehuai Chen; Jagadeesh Balam; Boris Ginsburg

Reinforcement Learning Enhanced Full-Duplex Spoken Dialogue Language Models for Conversational Interactions

Chen Chen, Ke Hu, Chao-Han Huck Yang, Ankita Pasad, Edresson Casanova, Weiqing Wang, Szu-Wei Fu, Jason Li, Zhehuai Chen, Jagadeesh Balam, Boris Ginsburg

Published: 08 Jul 2025, Last Modified: 26 Aug 2025COLM 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Full-Duplex model, Spoken Dialogue Models, Speech-to-Speech model

TL;DR: Use Reinforcement Learning to optimize the Spoken Dialogue Models

Abstract: Mainstream spoken dialogue language models (SDLMs) primarily handle turn-based interactions by alternating between processing user speech and generating responses. Recently emerging full-duplex SDLMs have showcased more natural and engaging conversational performance by simultaneously listening and speaking. However, the complex dynamics of human conversation introduce unique challenges to full-duplex SDLMs: Beyond generating reasonable responses, these models must exhibit diverse and prompt conversational behaviors in real-time interactions with the user. In this work, we present an efficient full-duplex SDLM optimized by Online Reinforcement with Interactive Speech Evaluation (ORISE). In ORISE, we design a customized reward function derived from automated annotations of online generated speech to guide the model toward well-formed and speech-text aligned responses. Experimental results show that ORISE effectively improves robustness and accuracy in handling conversational dynamics, including turn-taking, user barge-in, and backchanneling. Furthermore, ORISE enables the model to adapt to unseen noise conditions without relying on any labeled data, demonstrating the generalization of ORISE in real-world scenarios.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the COLM Code of Ethics on https://colmweb.org/CoE.html

Author Guide: I certify that this submission complies with the submission instructions as described on https://colmweb.org/AuthorGuide.html

Submission Number: 1363

Loading