LayerDecompose: Exploring weight sharing for Large Language Model Compression

LayerDecompose: Exploring weight sharing for Large Language Model Compression

ACL ARR 2025 May Submission6263 Authors

20 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Recent advances in large language model (LLM) compression have predominantly focused on pruning and low-rank factorization, leaving weight sharing—despite its success in classical neural network compression—largely unexplored. We introduce \textsc{LayerDecompose}, a novel framework that reduces parameter redundancy by sharing a core weight matrix across transformer layers and augmenting each layer with lightweight, low-rank adapters. Unlike prior SVD- and pruning-based methods, our joint optimization of shared weights and residual adapters achieves a 30\% model size reduction while retaining 89\% of the original performance on seven standard benchmarks. Experiments on LLaMA-7B and three other 7B-parameter models demonstrate that \textsc{LayerDecompose} consistently outperforms state-of-the-art baselines. These results highlight the promise of combining weight sharing with low-rank adaptation for efficient, scalable LLM deployment.

Paper Type: Long

Research Area: Efficient/Low-Resource Methods for NLP

Research Area Keywords: parameter-efficient-training, distillation, scaling

Contribution Types: NLP engineering experiment, Approaches low compute settings-efficiency

Languages Studied: English

Submission Number: 6263

Loading