Beyond Continuity: Challenges of Context Switching in Multi-Turn Dialogue with LLMs

Published: 02 Mar 2026, Last Modified: 08 Mar 2026ICLR 2026 Workshop ICBINBEveryoneRevisionsCC BY 4.0
Keywords: LLM, conversational assistants, dialogue systems, multi-turn conversations, failure modes
TL;DR: Comprehensive experimental evaluation and failure modes of modern LLMs' abilities to detect topic shifts in multi-turn conversations.
Abstract: Users interacting with Large Language Models (LLMs) in a multi-turn conversation routinely refine their requests or pivot to new topics. LLMs, however, often miss these topic shifts and carry over irrelevant context from previous turns, leading to inaccurate responses. In this paper, we stress-test the multi-turn understanding of LLMs and study the following two sub-tasks: (1) detecting whether the user pivots or refines in the current turn, and (2) shortlisting relevant context from previous turns. To this end, we construct synthetic benchmarks based on real-world datasets from varied domains, as to simulate context shifts of different levels of difficulty. We then evaluate the zero-shot performance of ten LLMs (open-weight, closed-source and reasoning), and demonstrate that only some reasoning and strongly instructed LLMs are accurate in detecting pivots; open-weight LLMs struggle with the task and frequently carry stale context even with explicit cues; and all models suffer from a position bias. Based on the results, we discuss key takeaways for improving long-term robustness in multi-turn capabilities for LLMs.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 27
Loading