Tracing the Computational Pathways of Delayed Disambiguation in Large Language Models

20 Sept 2025 (modified: 12 Feb 2026)ICLR 2026 Conference Desk Rejected SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: LLMs, Interpretability, Explainability, XAI, Metaphor
TL;DR: This paper demonstrates that causal language models resolve ambiguous words through Deferred Semantic Drift, context-dependent informational packets from the ambiguous word to update the overall sentence meaning downstream.
Abstract: Causal Large Language Models (LLMs) face a fundamental challenge with "delayed disambiguation": how is the meaning of a word updated when clarifying context arrives only after it has been processed? We investigate the underlying computational mechanism, proposing and demonstrating that this semantic re-evaluation is deferred to subsequent tokens. Through targeted analysis of attentional pathways, we show that these later tokens actively retrieve context-dependent "informational packets" from the ambiguous word's value vector, thereby steering the overall interpretation. To isolate the model's full representational capacity, we employ a non-causal analysis as an analytical tool, identifying the precise semantic information that must be computed downstream. We empirically demonstrate this "Deferred Semantic Drift" mechanism in metaphor comprehension and provide causal validation by successfully steering model outputs towards desired literal or metaphorical meanings through targeted activation interventions. This research uncovers a key computational strategy LLMs use for incremental meaning construction under causal constraints, offering crucial insights for understanding and guiding their behavior.
Supplementary Material: zip
Primary Area: interpretability and explainable AI
Submission Number: 23196
Loading