Grokking of Diffusion Models: Case Study on Modular Addition

Published: 03 Mar 2026, Last Modified: 07 Apr 2026ICLR 2026 DeLTa Workshop PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: grokking, diffusion model, mechanistic Interpretability
TL;DR: We provide the mechanistic evidence of grokking in diffusion models, uncovering how discrete arithmetic rules are internally represented within a high-dimensional image space and characterizing the underlying timestep dynamics of this process.
Abstract: Despite their empirical success, how diffusion models generalize remains poorly understood from a mechanistic perspective. We demonstrate that diffusion models trained with flow-matching objectives exhibit grokking—delayed generalization after overfitting—on modular addition, enabling controlled analysis of their internal computations. We study this phenomenon across two levels of data regime. In a single-image regime, mechanistic dissection reveals that the model implements modular addition by composing periodic representations of individual operands. In a diverse-image regime with high intraclass variability, we find that the model leverages its iterative sampling process to partition the task into an arithmetic computation phase followed by a visual denoising phase, separated by a critical timestep threshold. Our work provides the mechanistic decomposition of algorithmic learning in diffusion models, revealing how these models bridge continuous pixel-space generation and discrete symbolic reasoning.
Submission Number: 110
Loading