MoFO: Momentum-Filtered Optimizer for Mitigating Forgetting in LLM Fine-Tuning

Yupeng Chen; Senmiao Wang; Zhihang Lin; Yushun Zhang; Haozhe Zhang; Weijian Sun; Tian Ding; Ruoyu Sun

MoFO: Momentum-Filtered Optimizer for Mitigating Forgetting in LLM Fine-Tuning

Yupeng Chen, Senmiao Wang, Zhihang Lin, Yushun Zhang, Haozhe Zhang, Weijian Sun, Tian Ding, Ruoyu Sun

28 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: LLM fine-tuning, catastrophic forgetting

Abstract: Large language models (LLMs) have demonstrated remarkable capabilities across a wide range of tasks. Typically, an LLM is first pre-trained on large corpora and subsequently fine-tuned on task-specific datasets. However, during fine-tuning, LLMs may forget some knowledge acquired in the pre-training stage, leading to a decline in general capabilities. To address this challenge, we propose a new fine-tuning algorithm termed Momentum-Filtered Optimizer (MoFO). As an extension of greedy block coordinate descent (BCD) methods, MoFO iteratively selects and updates the model parameters with the largest momentum magnitudes. MoFO achieves similar fine-tuning performance to the default fine-tuning algorithm while effectively mitigating knowledge forgetting. Furthermore, MoFO does not require access to pre-training data, making it highly suitable for scenarios where the pre-training data is unavailable, such as fine-tuning checkpoint-only open-source LLMs. We validate MoFO through rigorous convergence analysis and extensive experiments, demonstrating its superiority over existing methods in mitigating forgetting.

Supplementary Material: pdf

Primary Area: foundation or frontier models, including LLMs

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 12763

Loading