Predictive Turn-Taking: Leveraging Language Models to Anticipate Turn Transitions in Human-Robot Dialogue
Abstract: Natural and engaging spoken dialogue systems require seamless turn-taking coordination to avoid awkward interruptions and unnatural pauses. Traditional systems often rely on simplistic silence thresholds, relinquishing the turn after a predetermined period of silence, which invariably leads to a suboptimal interaction experience. This work explores the potential of Large Language Models (LLMs) for improved turn-taking prediction. Building upon research that uses linguistic cues, we investigate how LLMs, with their rich contextual knowledge and semantic encoding of language, can be used for this task. We hypothesize that by analysing dialogue context, syntactic structure, and pragmatic cues within the user’s utterance, LLMs can offer more accurate turn-completion predictions. This research evaluates the capabilities of recent LLMs such as Gemini, OpenAI’s API, Anthropic’s Claude2, and Meta AI’s Llama 2 to predict turn-ending points solely based on textual information, and demonstrates how the conversation between elderly users and companion robots can be enhanced by LLM-powered end-of-turn prediction.
Loading