From Memorization to Reasoning in the Spectrum of Loss Curvature

From Memorization to Reasoning in the Spectrum of Loss Curvature

ICLR 2026 Conference Submission22610 Authors

20 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: memorization, generalization, mech interp, interpretability

TL;DR: We analyze directions in weight space in terms of loss landscape curvature, helps us understand memorization and various LM capabilities

Abstract: We characterize how memorization is represented in transformer models. The eigenbasis of the Fisher Information Matrix for MLP weight matrices encodes shared structure used across many data points at the top of the spectrum and memorized examples at the bottom, which we can disentangle in the weight space in both language models (LMs) and vision transformers (ViTs). We connect this finding to prior theoretical and empirical work on the curvature of the loss landscape for individual memorized datapoints, and use it to propose a weight editing procedure that suppresses far more recitation of untargeted memorized data more effectively than a state of the art unlearning method (BalancedSubnet; \citet{sakarvadiamitigating}), while maintaining lower perplexity. Since the basis of curvature has a natural interpretation for shared structure in model weights, we analyze the editing procedure extensively on its effect on downstream tasks in LMs, and find that fact retrieval and arithmetic are specifically and consistently negatively affected, even though open-book and general logical reasoning is conserved. We posit these tasks rely heavily on specialized directions in weight space rather than general purpose mechanisms, regardless of whether those individual datapoints are memorized. Our work enhances the understanding of memorization in neural networks with practical applications towards removing it, and provides evidence for idiomatic, heuristic-like structures that are used to solve tasks like math and fact retrieval.

Primary Area: interpretability and explainable AI

Submission Number: 22610

Loading