Bridging Latent Reasoning and Target-Language Generation via Retrieval-Transition Heads

Bridging Latent Reasoning and Target-Language Generation via Retrieval-Transition Heads

ACL ARR 2026 January Submission6750 Authors

05 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Mechanistic Interpretability, Multilingual LLMs, Retrieval-Transition Heads, Retrieval Heads, Chain-of-Thought Reasoning, Long-Context Retrieval, Causal Intervention

Abstract: Recent work has identified retrieval heads, specialized attention heads in Transformer largely responsible for retrieving information from the context. In this work, we investigate retrieval heads in multilingual contexts and then ask whether these are indeed the most important attention heads for multilingual reasoning. In multilingual language models, we find that retrieval heads are often shared across multiple languages. We further identify $\textit{Retrieval-Transition heads (RTH)}$, which govern the shift from language-agnostic latent reasoning to target-language output. We quantify the semantic mapping between the concept space and target language tokens to identify RTHs. Unlike standard retrieval heads that copy from in-context, RTHs facilitate latent-to-language semantic mapping. Our experiments reveal that RTHs are distinct from retrieval heads and more vital for Chain-of-Thought reasoning in multilingual LLMs. For Qwen-2.5 7B Instruct, masking the 25 most influential RTHs triggers a 22.7-point average drop in MMLU-ProX accuracy across four languages, higher than the 14.3-point average decline observed when masking an equal number of standard retrieval heads. These results prove that RTHs are functionally unique and essential for maintaining both reasoning and coherence in multilingual models. Our work advances understanding of multilingual LMs by isolating the attention heads responsible for latent-to-language mapping and demonstrating its role in multilingual reasoning performance.

Paper Type: Long

Research Area: Interpretability and Analysis of Models for NLP

Research Area Keywords: knowledge tracing/discovering/inducing, feature attribution, probing, robustness, explanation faithfulness

Contribution Types: Model analysis & interpretability

Languages Studied: English, German, Chinese, Swahili

Submission Number: 6750

Loading