Why Knowing Both Hops Is Not Enough: Understanding Two-Hop Generalization in Language Models

Why Knowing Both Hops Is Not Enough: Understanding Two-Hop Generalization in Language Models

ACL ARR 2026 January Submission9558 Authors

06 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: interpretability, reasoning, multihop QA, generalization

Abstract: Large language models (LLMs) can solve complex multi-hop problems, yet they exhibit a puzzling failure on seemingly simple two-hop queries: although a model may correctly store each individual hop, it often fails to combine them. In this paper, we study the internal mechanism in two-hop reasoning by training transformers from scratch in a controlled symbolic environment. Our experiments reveal a systematic pattern in two-hop generalization: Models generalize reliably when the second hop follows the same distributional patterns observed during training, but systematically fail when the second hop deviates-even though all required atomic facts are individually encoded. Mechanistic analysis shows that this failure arises from a mismatch across layers: lower layers correctly construct compact intermediate representations, while upper layers are specialized to only reason on representations produced within multi-hop trajectories seen during training. Consequently, correct intermediate information is not effectively consumed by upper layers during out-of-distribution two-hop inference. Motivated by this mechanistic misalignment, we use a recurrent-style training strategy that applies the same blocks to both input embeddings and intermediate hidden states, implicitly aligning their formats. This training strategy enables transformers to reuse their reasoning circuitry across input forms and substantially improves generalization on out-of-distribution two-hop queries.

Paper Type: Long

Research Area: Interpretability and Analysis of Models for NLP

Research Area Keywords: interpretability, reasoning, multihop QA, generalization, probing

Contribution Types: Model analysis & interpretability, NLP engineering experiment

Languages Studied: English

Submission Number: 9558

Loading