Exploring Forgetting in LLM Pre-Training

Anonymous

Exploring Forgetting in LLM Pre-Training

Anonymous

16 Feb 2024ACL ARR 2024 February Blind SubmissionReaders: Everyone

Abstract: In large language model (LLM), the challenge of catastrophic forgetting remains a formidable obstacle to building an omniscient model. Despite the pioneer research on task-level forgetting in LLM fine-tuning, there is a dearth of studies focusing on finer-grained forgetting at the sample level during pre-training. This paper delves into the intricacies of forgetting during the pre-training phase, where models are typically trained on massive diverse corpus for only one epoch. We systematically explore the existence, essence, and measurement of forgetting in LLM pre-training. Specifically, we investigate the limitations of traditional metrics such as perplexity (ppl) in accurately measuring the forgetting in pre-training, and propose three new metrics evaluating LLM's ability of assessing related memories of entities, which is viewed as the key reflection of whether forgetting happens in pre-training. Extensive evaluations and insights on forgetting of pre-training facilitate future researches on LLMs.

Paper Type: short

Research Area: Interpretability and Analysis of Models for NLP

Languages Studied: English

0 Replies

Loading