Abstract: In large language model (LLM), the challenge of catastrophic forgetting remains a formidable obstacle to building an omniscient model. Despite the pioneer research on task-level forgetting in LLM fine-tuning, there is a dearth of studies focusing on finer-grained forgetting at the sample level during pre-training. This paper delves into the intricacies of forgetting during the pre-training phase, where models are typically trained on massive diverse corpus for only one epoch. We systematically explore the existence, essence, and measurement of forgetting in LLM pre-training. Specifically, we investigate the limitations of traditional metrics such as perplexity (ppl) in accurately measuring the forgetting in pre-training, and propose three new metrics evaluating LLM's ability of assessing related memories of entities, which is viewed as the key reflection of whether forgetting happens in pre-training. Extensive evaluations and insights on forgetting of pre-training facilitate future researches on LLMs.
Paper Type: short
Research Area: Interpretability and Analysis of Models for NLP
Languages Studied: English
0 Replies
Loading