Exploring Forgetting in LLM Pre-TrainingDownload PDF

Anonymous

16 Feb 2024ACL ARR 2024 February Blind SubmissionReaders: Everyone
Abstract: In large language model (LLM), the challenge of catastrophic forgetting remains a formidable obstacle to building an omniscient model. Despite the pioneer research on task-level forgetting in LLM fine-tuning, there is a dearth of studies focusing on finer-grained forgetting at the sample level during pre-training. This paper delves into the intricacies of forgetting during the pre-training phase, where models are typically trained on massive diverse corpus for only one epoch. We systematically explore the existence, essence, and measurement of forgetting in LLM pre-training. Specifically, we investigate the limitations of traditional metrics such as perplexity (ppl) in accurately measuring the forgetting in pre-training, and propose three new metrics evaluating LLM's ability of assessing related memories of entities, which is viewed as the key reflection of whether forgetting happens in pre-training. Extensive evaluations and insights on forgetting of pre-training facilitate future researches on LLMs.
Paper Type: short
Research Area: Interpretability and Analysis of Models for NLP
Languages Studied: English
0 Replies

Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview