Model Merging by Output-Space Projection

Published: 24 May 2026, Last Modified: 08 Jun 2026ICML 2026 Workshop WSS PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: model merging, weight space
TL;DR: Model merging can be solved as a projection problem, where the optimal merge corresponds to capturing residual energy in output space.
Abstract: Model merging combines fine-tuned checkpoints into a single multi-task model without retraining. Existing methods—such as task arithmetic, model soups, TIES, and DARE—are computationally efficient and empirically successful, but rely on heuristic design choices and lack formal optimality guarantees. We show that merging can be formulated as a convex quadratic programme over residual updates, yielding weights that minimise a squared-output calibration objective using calibration inputs and fine-tuned model outputs, and subsuming existing methods as special cases. Our framework yields a closed-form diagnostic—the fraction of residual energy captured by a chosen basis—that predicts downstream merge quality using only the calibration set. Empirically, the QP matches or outperforms existing methods in the single-layer setting, and we characterise when the optimal basis provides significant gains over the cheaper diagonal QP. We extend to multi-layer merging via a sequential layer-wise algorithm and demonstrate consistent gains across language and vision benchmarks.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 40
Loading