The Two-Hop Curse: LLMs trained on A→B, B→C fail to learn A→C

Mikita Balesni; Tomasz Korbak; Owain Evans

The Two-Hop Curse: LLMs trained on A→B, B→C fail to learn A→C

Mikita Balesni, Tomasz Korbak, Owain Evans

27 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: latent reasoning, two-hop reasoning, chain of thought, LLMs, question answering, llama, fine-tuning, fact representation, knowledge representation, world models

TL;DR: We show that LLMs (LLaMA-3-8B) fail to learn how to combine two facts to answer a two-hop question, even despite being finetuned to do so

Abstract: While LLMs excel at answering multi-hop questions like “Who is the spouse of the performer of Imagine?” by thinking out loud (chain-of-thought), they perform surprisingly poorly when required to reason in their latent space and answer without chain-of-thought. This observation was previously referred to as the compositionality gap, implying that although language models are less reliable at two-hop latent reasoning, they still perform it sometimes. In this paper, we introduce a controlled setting for investigating the compositionality gap. We run a series of experiments finetuning a large language model (Llama-3-8B-Instruct) on synthetic facts expressed in English. We attempt to elicit two-hop reasoning in three ways: (i) fine-tune on a data mixture designed to incentivize two-hop reasoning, (ii) force facts to be stored in layers in the correct order, and (iii) use an auxiliary loss to provide activation-level supervision for two-hop reasoning. We show that LLaMA 3 8B successfully learns to answer two-hop questions about synthetic facts using CoT, but completely fails without CoT, achieving chance-level accuracy and chance-level test loss. Failures of LLMs in our controlled setting cast doubt on the purported ability of present LLMs to perform multihop latent reasoning and lead us to conjecture that, rather than a reasoning gap, current language models might exhibit a two-hop reasoning curse — a complete lack of ability rather than a relative weakness. This is the Two-Hop Curse.

Primary Area: foundation or frontier models, including LLMs

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 9905

Loading