Rote Learning Considered Useful: Generalizing over Memorized Data in LLMs

Published: 11 Jun 2025, Last Modified: 14 Jul 2025MemFMEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Large Language Models; Rote learning; Memorization; Generalization; Factual Knowledge
TL;DR: We introduce a two-phase “memorize-then-generalize” framework to show that LLMs can be trained to generalize from rote-memorized data.
Abstract: Rote learning is a memorization technique based on repetition. It is commonly believed to hinder generalization by encouraging verbatim memorization rather than deeper understanding. This insight holds for even learning factual knowledge that inevitably requires a certain degree of memorization. In this work, we demonstrate that LLMs can be trained to generalize from rote memorized data. We introduce a two-phase “memorize-then-generalize” framework, where the model first rote memorizes facts using a semantically meaningless prompt and then learns to generalize by finetuning on a small set of semantically meaningful prompts. We show that LLMs can reinterpret rote memorized knowledge to reflect new semantics, as evidenced by the emergence of structured, semantically aligned latent representations. This surprising finding opens the door to both efficient and effective knowledge injection and possible risks of repurposing the memorized data for malicious usage. Code for our experiments is available at: https://github.com/QinyuanWu0710/memorize-then-generalize
Submission Number: 39
Loading