Pattern Shifting or Knowledge Losing? A Forgetting Perspective for Understanding the Effect of Instruction Fine-Tuning
Abstract: Instruction Fine-Tuning (IFT) emerges as an essential step of training large language models to robustly carry out tasks of interest. However, there lacks a systematic investigation about the underlying mechanisms of instruction fine-tuning, particularly on the forgetting phenomenon after IFT, known as alignment tax. Therefore, to understand the mechanism of IFT from the forgetting perspective, we investigate the alternation of the text pattern and knowledge within models throughout the entire IFT process. Specifically, we restore fine-tuned models to their base version by training them on the data sharing a similar distribution with the pre-training corpus and compare their results Our experiment indicates that there is a stage transition of forgetting during IFT process: (1) Pseudo Forgetting: in this stage, models mainly shift their familiar text pattern away from pre-training data format while the world knowledge is preserved. Consequently, models will recover to their original performance when they are restored to the base version. (2) Actual Forgetting: in this stage, models forget the acquired knowledge as well. Therefore, they fail to reach the original performance even if they are restored to the base version.
Loading