DuplexGen: Adaptive Synthesis of Human-AI Turn-Taking Dialogues
Keywords: Turn-taking dialogue, Spoken dialogue system, Full-duplex language model
TL;DR: We propose a turn-taking dialogue synthesis framework to improve full-duplex speech language models.
Abstract: Spoken dialogue systems aim to facilitate natural human-AI interaction through dynamic turn-taking behaviors, such as backchanneling and floor-taking. Although humans naturally adapt their turn-taking to different social contexts, modeling AI turn-taking in a scenario-adaptive manner remains challenging. In addition, existing human-human corpora, often collected from unstructured conversations on random topics, fail to capture the scenario-dependent variation in turn-taking behaviors. To this end, we introduce **DuplexGen**, a dialogue synthesis framework that characterizes and generates the appropriate AI turn-taking behaviors in a scenario-adaptive manner, from lenient waiting to strategic floor-taking. By collecting a small set of human annotations for each scenario, we align large language model turn-taking annotations with human judgments. We demonstrate that DuplexGen can synthesize turn-taking dialogues that are more aligned with human preferences, and training a full-duplex language model with the synthesized dialogues can enable scenario-specific turn-taking behaviors.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 106
Loading