Keywords: depth pruning, emergent ability, structured pruning, large language models, LLM deployment
TL;DR: Simple, modality-agnostic, depth pruning methods and a systematic evaluation of emergent abilities in pruned LLMs.
Abstract: Large foundation models face deployment challenges in resource-constrained environments. While width pruning typically outperforms depth pruning, we introduce LayerMerge, a simple modality-agnostic depth pruning technique that closes the performance gap with width pruning while providing linear reductions in inference time and memory. Extensive benchmarks show LayerMerge preserves emergent abilities under aggressive compression, maintaining most of the original performance while reducing model depth by up to 90%.
Submission Number: 121
Loading