R2G Loss Accelerates Grokking in Transformer Models

18 Sept 2025 (modified: 02 Dec 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: grokking; transformer; attention mechanism; interpretability;
TL;DR: This study offers a novel mechanistic understanding of grokking, along with a practical tool (R2G Loss) to accelerate the grokking process in transformer models.
Abstract: Transformers, whose wide success makes them central to grokking research, rely on inter-token interactions within attention layers. Through experiments on attention mechanisms, we find the distributional differences in the representation space of three phases (memorization, semi-grokking and grokking). We observe that during the grokking phase, the model develops a structural separation of tokens and learns both the dual characteristics (symbolic and numerical) of input data. Based on these findings, we propose R2G (Repel to Grokking) Loss: a simple method to accelerate grokking with finite data, which can foster higher-level generalization. Empirical studies on different arithmetic tasks demonstrate that R2G Loss effectively modulates training dynamics, leading to significant performance improvement under identical input. Our method is validated on different arithmetic tasks and a non-arithmetic task, in both of which we achieve an improvement in the model’s grokking. In modular arithmetic tasks, we are even able to achieve grokking in situations where training has previously failed. Our work offers a novel mechanistic understanding of grokking, along with a simple and versatile tool to accelerate the grokking process in transformer models. These findings also have the potential to inspire effective enhancement of the model's generalization capability across a broader range of scenarios.
Primary Area: interpretability and explainable AI
Submission Number: 10469
Loading