Reinforcement Learning Enhanced Full-Duplex Spoken Dialogue Language Models for Conversational Interactions

Published: 08 Jul 2025, Last Modified: 26 Aug 2025COLM 2025EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Full-Duplex model, Spoken Dialogue Models, Speech-to-Speech model
TL;DR: Use Reinforcement Learning to optimize the Spoken Dialogue Models
Abstract: Mainstream spoken dialogue language models (SDLMs) primarily handle turn-based interactions by alternating between processing user speech and generating responses. Recently emerging full-duplex SDLMs have showcased more natural and engaging conversational performance by simultaneously listening and speaking. However, the complex dynamics of human conversation introduce unique challenges to full-duplex SDLMs: Beyond generating reasonable responses, these models must exhibit diverse and prompt conversational behaviors in real-time interactions with the user. In this work, we present an efficient full-duplex SDLM optimized by Online Reinforcement with Interactive Speech Evaluation (ORISE). In ORISE, we design a customized reward function derived from automated annotations of online generated speech to guide the model toward well-formed and speech-text aligned responses. Experimental results show that ORISE effectively improves robustness and accuracy in handling conversational dynamics, including turn-taking, user barge-in, and backchanneling. Furthermore, ORISE enables the model to adapt to unseen noise conditions without relying on any labeled data, demonstrating the generalization of ORISE in real-world scenarios.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the COLM Code of Ethics on https://colmweb.org/CoE.html
Author Guide: I certify that this submission complies with the submission instructions as described on https://colmweb.org/AuthorGuide.html
Submission Number: 1363
Loading