Temporally-Aware Turn-Taking: A Framework for Precise Timing and Style Control Towards Natural Full-Duplex Interaction
Keywords: Temporally-Aware Turn-Taking; Full-duplex System
Abstract: Unlike half-duplex systems restricted to reactive turn-taking, natural full-duplex interaction requires precise timing for both reactive responses and proactive behaviors, such as system-initiated interruptions and backchanneling. However, current speech LLMs struggle with the full-duplex mode due to imprecise turn timing or significant reasoning degradation. To achieve natural and controllable full-duplex interaction, we introduce a Lightweight, Temporally-Aware Turn Controller, LTA-TC, which provides fine-grained turn-timing predictions and time-sensitive style controls. LTA-TC is designed for broad compatibility, either enabling full-duplex interaction for half-duplex LLMs or augmenting the performance of native full-duplex architectures. As existing full-duplex data is primarily synthetic and lacks proactive behavior annotations, we construct ProTurn, a real-world human-human dataset featuring region-based reactive and proactive labels. By categorizing behaviors via timing offsets, ProTurn supports stylistic instructions across five turn-transition and five backchannel styles. To evaluate the turn-timing awareness of full-duplex systems, we introduce an evaluation framework that assesses performance at both chunk and turn levels. Experimental results demonstrate that LTA-TC achieves superior performance across timing of interruptions and backchanneling, time-sensitive style control, and response quality.
Paper Type: Long
Research Area: Dialogue and Interactive Systems
Research Area Keywords: multi-modal dialogue systems; applications; dialogue state tracking;
Contribution Types: NLP engineering experiment
Languages Studied: English
Submission Number: 2339
Loading