Model Extrapolation Expedites Alignment

ICLR 2025 Conference Submission4718 Authors

25 Sept 2024 (modified: 13 Oct 2024)ICLR 2025 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: large language model, alignment, preference optimization, model merging
TL;DR: We present the ExPO method, which applies model extrapolation to achieve efficient LLM alignment.
Abstract: As the alignment training of large language models (LLMs) usually requires expensive computational resources, exploring more efficient alignment methods to reduce training overhead has always been an important and compelling research challenge. Inspired by prior work on *model interpolation*, we present a simple method called ***ExPO (model extrapolation)*** to expedite the alignment of LLMs with human preferences. Based on our observation that interpolating the weights between existing DPO/RLHF models and their initial SFT checkpoints usually produces new models with intermediate performance, we propose to treat a partially-trained model $\mathcal{M}_1$ (corresponding to the intermediate-performing model) as the interpolated result between the initial SFT checkpoint $\mathcal{M}_0$ and a hypothetical better-aligned model $\mathcal{M}_2$. Thus, we can obtain the hypothetical $\mathcal{M}_2$ by simply extrapolating the model weights along the direction from $\mathcal{M}_0$ to $\mathcal{M}_1$, which consequently saves the additional training overhead for $\mathcal{M}_1$ to reach better alignment performance. We validate our hypothesis through controlled experiments, demonstrating that ExPO can boost a DPO model trained with only 20% steps to outperform the fully-trained one. Additionally, we show that ExPO can also notably improve existing open-source LLMs (ranging from 1.8B to 70B parameters), as evidenced by evaluations on the mainstream LLM benchmarks AlpacalEval 2.0 and MT-Bench, which further highlights ExPO's utility and potential in enabling more efficient LLM alignment.
Primary Area: foundation or frontier models, including LLMs
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 4718
Loading