Abstract: In large language models (LLMs), the challenge of catastrophic forgetting remains a formidable obstacle to building an omniscient model. Despite the pioneering research on task-level forgetting in LLM fine-tuning, there is scant focus on sample-level forgetting during this phase, where models often see each datapoint only once. We systematically explore the existence, essence, and measurement of forgetting in LLM pre-training, questioning traditional metrics such as perplexity (PPL) and introducing new metrics to better detect entity memory retention, which is the indicator of forgetting. Taking inspiration from human memory patterns, we propose and refine memory replay techniques to combat the phenomenon of forgetting in LLMs. Extensive evaluations and analyses on forgetting of pre-training could facilitate future research on LLMs.
Paper Type: Long
Research Area: Language Modeling
Research Area Keywords: pre-training
Contribution Types: Model analysis & interpretability
Languages Studied: English
Submission Number: 1838
Loading