Abstract: Model merge offers a cost-efficient method for integrating multiple specialized large language models (LLMs) into one comprehensive model. While it shows promise for encoder-decoder models in standard Natural Language Processing (NLP) tasks, \textbf{we find that merging decoder-based LLMs may exacerbate alignment tax and lead to model collapse, even when overall performance appears to improve.} We specifically assess the applications of model merge in steering LLMs to align better with diverse human preferences through interpolation and extrapolation merge. Our extensive experiments, covering model sizes ranging from $\mathtt{7b}$ to $\mathtt{70b}$ parameters, and including sixteen models with varying post-training, employ three popular merging methods: $\mathtt{Task~Arithmetic}$, $\mathtt{TIES}$-$\mathtt{Merging}$, and $\mathtt{Dare}$-$\mathtt{TIES}$. Our results uncover inherent limitations in current model merge applications for alignment, which can lead to text degeneration. We hope our findings will offer valuable insights for employing model merging in alignment scenarios and can help practitioners avoid potential pitfalls.
Submission Type: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Ruqi_Zhang1
Submission Number: 6658
Loading