LEWIS (LayEr WIse Sparsity) - A Training Free Guided Model Merging Approach

Published: 05 Mar 2025, Last Modified: 07 Apr 2025SLLMEveryoneRevisionsBibTeXCC BY 4.0
Track: long paper (up to 4 pages)
Keywords: Model Merging, Sparsity, Efficiency, Large Language Models, Selective Pruning
TL;DR: LEWIS is a training-free, layer-wise guided model merging approach that optimizes task-vector sparsity using activation norms, improving task-specific performance in merged models.
Abstract:

As specialized large language models (LLMs) become increasingly prevalent, model merging methods are being used to combine them to create a single multi-task model without requiring any additional data or training. However, these approaches fall short when the objective of merging is to increase the downstream model’s performance on a particular task-specific benchmark. In this work, we propose LEWIS (LayEr WIse Sparsity), a guided model-merging framework that uses activation-based layer importance to dynamically adjust layer-wise task-vector sparsity required for the merge process. LEWIS uses a calibration dataset to prioritize critical layers during the task-vector pruning process required for model merging. This approach guides existing merging methods by preserving essential layer-wise task-specific knowledge while ensuring the merged model performs the best at benchmarks resembling the calibration dataset. Our experiments demonstrate the effectiveness of LEWIS with performance improvements of code instruction-following and math-solving models created through model merging up to 4% and 11.3%, respectively, outperforming unguided data-less model merging approaches that use uniform-sparsity

Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.
Submission Number: 34
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview