The Pitfalls of Text Degeneration when Aligning LLMs through Model Merge

The Pitfalls of Text Degeneration when Aligning LLMs through Model Merge

TMLR Paper6658 Authors

26 Nov 2025 (modified: 26 Feb 2026)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: Model merge offers a cost-efficient method for integrating multiple specialized large language models (LLMs) into one comprehensive model. While it shows promise for encoder-decoder models in standard Natural Language Processing (NLP) tasks, \textbf{we find that merging decoder-based LLMs may lead to localized text degeneration, even when overall performance appears to improve.} We specifically assess the applications of model merge in steering LLMs to align better with diverse human preferences through interpolation and extrapolation merge. Our extensive experiments, covering model sizes ranging from $\mathtt{7b}$ to $\mathtt{70b}$ parameters, and including sixteen models with varying post-training, employ three popular merging methods: $\mathtt{Task~Arithmetic}$, $\mathtt{TIES}$-$\mathtt{Merging}$, and $\mathtt{Dare}$-$\mathtt{TIES}$. Our results uncover inherent limitations in current model merge applications for alignment, which can lead to text degeneration. We hope our findings will offer valuable insights for employing model merging in alignment scenarios and can help practitioners avoid potential pitfalls.

Submission Type: Regular submission (no more than 12 pages of main content)

Assigned Action Editor: ~Ruqi_Zhang1

Submission Number: 6658

Loading