LS-Merge: Merging Language Models in Latent Space
Keywords: ls-merge, model merging, latent space merging, VAE
TL;DR: ls--merge
Abstract: Model merging in weight space efficiently reuses pretrained models, but existing methods assume matching architectures, making heterogeneous merges brittle or infeasible. We address this by encoding weights into a smooth latent space where cross-architecture merging becomes natural, then decoding the merged representation back to weights. Two challenges arise: the billions of parameters in LLMs make latent encoding costly, and high compression ratios can hinder generalization to unseen weights. We tackle both with a transformer-based VAE trained via a two-stage compression curriculum with layer-aware chunking, first learning a high-capacity representation and then distilling it to a compact code for stability and out-of-distribution generalization. A dimensionality-matching projection and distributional alignment via optimal transport further enables interpolation across different model sizes. Empirically, latent-space interpolation is more robust than direct weight averaging and yields stronger downstream performance across heterogeneous merges. Together, these components provide a scalable recipe for model merging across heterogeneous architectures.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 65
Loading