LayerMerge: Modality-Agnostic Depth Pruning for Efficient Foundation Model Deployment

Arjun Choudhry; Chang Liu; Nina Żukowska; Yifu Cai; Mononito Goswami; Artur Dubrawski

LayerMerge: Modality-Agnostic Depth Pruning for Efficient Foundation Model Deployment

Arjun Choudhry, Chang Liu, Nina Żukowska, Yifu Cai, Mononito Goswami, Artur Dubrawski

Published: 16 Oct 2025, Last Modified: 10 Nov 2025NeurIPS 2025 ER WorkshopEveryoneRevisionsBibTeXCC BY 4.0

Keywords: depth pruning, emergent ability, structured pruning, large language models, LLM deployment

TL;DR: Simple, modality-agnostic, depth pruning methods and a systematic evaluation of emergent abilities in pruned LLMs.

Abstract: Large foundation models face deployment challenges in resource-constrained environments. While width pruning typically outperforms depth pruning, we introduce LayerMerge, a simple modality-agnostic depth pruning technique that closes the performance gap with width pruning while providing linear reductions in inference time and memory. Extensive benchmarks show LayerMerge preserves emergent abilities under aggressive compression, maintaining most of the original performance while reducing model depth by up to 90%.

Submission Number: 121

Loading