FlattenGPT: Depth Compression for Transformer with Layer Flattening

Ruihan Xu; Qingpei Guo; Yao Zhu; Xiangyang Ji; Ming Yang; Shiliang Zhang

FlattenGPT: Depth Compression for Transformer with Layer Flattening

Ruihan Xu, Qingpei Guo, Yao Zhu, Xiangyang Ji, Ming Yang, Shiliang Zhang

18 Sept 2025 (modified: 09 Jan 2026)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: structured pruning, depth compression, layer pruning, layer merging, Transformer, LLM

TL;DR: We propose a novel depth compression methods for LLMs, FlattenGPT, which employs layer flattening to bridge the gap between layer pruning and channel pruning.

Abstract: This work proposes FlattenGPT, a novel fine-grained depth compression method for transformers. Recent works have observed redundancy across transformer blocks, prompting the research of depth compression to prune less crucial blocks. However, such entire-block pruning risks discarding knowledge learned in those blocks, leading to serious performance degradation. On the other hand, channel pruning can better preserve performance, while it cannot compress model depth and is challenged by inconsistent pruning ratios for each layer. To address this issue, our method introduces a novel operation named layer flattening, which bridges the gap between layer pruning and channel pruning. By converting two adjacent blocks into one, it compresses the network depth and enables fine-grained parameter removal. FlattenGPT strives to preserve the knowledge learned in all blocks and remain consistent with the original architecture, enhancing model efficiency with a decent trade-off to performance. Extensive experiments demonstrate that FlattenGPT outperforms existing pruning methods in both zero-shot accuracies and WikiText-2 perplexity across various model types and parameter sizes. It also outperforms other pruning methods in accelerating LLM inference, making it a promising approach for enhancing the efficiency of transformers.

Primary Area: applications to computer vision, audio, language, and other modalities

Submission Number: 11797

Loading