Think or Remember? Detecting and Directing LLMs Towards Memorization or Generalization

Yi-Fu Fu; Yu-Chieh Tu; TZU-LING CHENG; Cheng-Yu Lin; Yi-Ting Yang; Heng-Yi Liu; Da-Cheng Juan; Shou-De Lin

Think or Remember? Detecting and Directing LLMs Towards Memorization or Generalization

Yi-Fu Fu, Yu-Chieh Tu, TZU-LING CHENG, Cheng-Yu Lin, Yi-Ting Yang, Heng-Yi Liu, Da-Cheng Juan, Shou-De Lin

28 Sept 2024 (modified: 14 Dec 2024)ICLR 2025 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: LLM, generalization, memorization, neuron differentiation, behavior identification, inference-time intervention, behavior control

TL;DR: We explore memorization and generalization in LLMs, showing neuron-wise differentiation and successfully predicting and controlling these behaviors through specialized datasets, classifiers, and interventions.

Abstract: In this paper, we study fundamental mechanisms of memorization and generalization in Large Language Models (LLMs), drawing inspiration from the functional specialization observed in the human brain. Our study aims to (a) determine whether LLMs exhibit spatial differentiation of neurons for memorization and generalization, (b) predict these behaviors using internal representations, and (c) control them through inference-time interventions. To achieve this, we design specialized datasets to distinguish between memorization and generalization, build up classifiers to predict these behaviors from model hidden states and develop interventions to influence the model in real time. Our experiments reveal that LLMs exhibit neuron-wise differentiation for memorization and generalization, and the proposed intervention mechanism successfully steers the model's behavior as intended. These findings significantly advance the understanding of LLM behavior and demonstrate the potential for enhancing the reliability and controllability of LLMs.

Supplementary Material: pdf

Primary Area: foundation or frontier models, including LLMs

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 13625

Loading