Keywords: Large Language Models; Rote learning; Memorization; Generalization; Factual Knowledge
TL;DR: We introduce a two-phase “memorize-then-generalize” framework to show that LLMs can be trained to generalize from rote-memorized data.
Abstract: Rote learning is a memorization technique based on repetition. It is commonly believed to hinder generalization by encouraging verbatim memorization rather than deeper understanding. This insight holds for even learning factual knowledge that inevitably requires a certain degree of memorization. In this work, we demonstrate that LLMs can be trained to generalize from rote memorized data. We introduce a two-phase “memorize-then-generalize” framework, where the model first rote memorizes facts using a semantically meaningless prompt and then learns to generalize by finetuning on a small set of semantically meaningful prompts. We show that LLMs can reinterpret rote memorized knowledge to reflect new semantics, as evidenced by the emergence of structured, semantically aligned latent representations.
This surprising finding opens the door to both efficient and effective knowledge injection and possible risks of repurposing the memorized data for malicious usage.
Code for our experiments is available at: https://github.com/QinyuanWu0710/memorize-then-generalize
Submission Number: 39
Loading