Grokking in the Wild: Data Augmentation for Real-World Multi-Hop Reasoning with Transformers

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
TL;DR: We extend grokking to real-world factual reasoning by augmenting knowledge graphs with synthetic data, enabling near-perfect multi-hop reasoning performance.
Abstract: Transformers have achieved great success in numerous NLP tasks but continue to exhibit notable gaps in multi-step factual reasoning, especially when real-world knowledge is sparse. Recent advances in grokking have demonstrated that neural networks can transition from memorizing to perfectly generalizing once they detect underlying logical patterns -- yet these studies have primarily used small, synthetic tasks. In this paper, for the first time, we extend grokking to real-world factual data and address the challenge of dataset sparsity by augmenting existing knowledge graphs with carefully designed synthetic data to raise the ratio $\phi_r$ of inferred facts to atomic facts above the threshold required for grokking. Surprisingly, we find that even factually incorrect synthetic data can strengthen emergent reasoning circuits rather than degrade accuracy, as it forces the model to rely on relational structure rather than memorization. When evaluated on multi-hop reasoning benchmarks, our approach achieves up to 95--100\% accuracy on 2WikiMultiHopQA -- substantially improving over strong baselines and matching or exceeding current state-of-the-art results. We further provide an in-depth analysis of how increasing $\phi_r$ drives the formation of generalizing circuits inside Transformers. Our findings suggest that grokking-based data augmentation can unlock implicit multi-hop reasoning capabilities, opening the door to more robust and interpretable factual reasoning in large-scale language models.
Lay Summary: Many real-world questions require linking facts across multiple sources, like determining a person’s birth year by first identifying their spouse and then tracing their birth date. Current AI models struggle with this because they tend to memorize isolated facts without understanding how they connect. We tackled this by adding synthetic links to the training data of language models. These artificial connections aren’t always factual, but they force the models to focus on linking facts instead of just memorizing them. Surprisingly, the "fake" connections don’t harm the models; instead, they push them to develop their own reasoning pathways. Our experiments showed that this approach helps AI models go beyond memorization, enabling them to answer complex, multi-step questions with impressive accuracy. This could lead to more powerful and trustworthy AI systems in many different fields.
Primary Area: Deep Learning->Large Language Models
Keywords: multi-hop reasoning, transformers, grokking, knowledge graphs, data synthesis, generalization circuits, factual reasoning, NLP, synthetic data augmentation.
Submission Number: 6639
Loading