Amuro and Char: Analyzing the Relationship between Pre-Training and Fine-Tuning of Large Language Models

Amuro and Char: Analyzing the Relationship between Pre-Training and Fine-Tuning of Large Language Models

ACL ARR 2024 June Submission4449 Authors

16 Jun 2024 (modified: 03 Jul 2024)ACL ARR 2024 June SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: The development of large language models leads to the formation of a pre-train-then-align paradigm, in which the model is typically pre-trained on a large text corpus and undergoes a tuning stage to align the model with human preference or downstream tasks. In this work, we investigate the relationship between pre-training and fine-tuning by fine-tuning multiple intermediate pre-trained model checkpoints, we find that i) continual pre-training improves the model in a latent way that unveils after fine-tuning; ii) with extra fine-tuning, the datasets that the model does not demonstrate capability gain much more than those that the model performs well during the pre-training stage; iii) although model benefits significantly through supervised fine-tuning, it may forget previously known domain knowledge and the tasks that are not seen during fine-tuning; iv) the supervised fine-tuned model resembles high sensitivity to few-shot evaluation prompts, but this sensitivity can be alleviated by more pre-training.

Paper Type: Long

Research Area: Language Modeling

Research Area Keywords: Fine-tuning, Pre-training, Instruction Tuning, Training Dynamics

Contribution Types: Model analysis & interpretability

Languages Studied: English

Submission Number: 4449

Loading