Variational Model Merging for Pareto Front Estimation in Multitask Finetuning

ICLR 2026 Conference Submission2571 Authors

06 Sept 2025 (modified: 21 Nov 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Variational Bayes, Bayesian methods, Pareto front, multitask finetuning, model merging
TL;DR: We propose a new variational model merging method that can yield arbitrary accurate Pareto fronts in multitask finetuning.
Abstract: We propose a new variational model merging method that can yield arbitrarily accurate Pareto fronts in multitask finetuning. The idea is to first compute posterior-approximations on each task separately and then quickly merge them to obtain a cheap estimate of Pareto front. The main theoretical result is to show that more flexible posteriors necessarily yield better estimates, for example, a Pareto front obtained by merging full Gaussian posteriors is expected to be better than those obtained with isotropic Gaussians. This is because the error incurred by a specific class of distributions can always be reduced by increasing the size of the class. We validate the theory through extensive empirical results on deep networks (Vision and Language Transformers) where better Gaussian families consistently yields better or comparable Pareto fronts. Our work is a rare instance where Bayesian ideas are used to improve Pareto analysis.
Primary Area: probabilistic methods (Bayesian methods, variational inference, sampling, UQ, etc.)
Submission Number: 2571
Loading