Keywords: Depth pruning, Activation function pruning, Practical speedups, Transformer compression
TL;DR: We push the boundaries of ViT depth pruning by introducing BoundaryDPT, achieving 1.58× speedup on DeiT-B with maintained accuracy. Combined with width pruning, BoundaryDPT+ advances SOTA speedup from 4.24× to 5.19×.
Abstract: While prior studies have successfully compressed vision Transformers (ViTs) through various pruning techniques, most have concentrated on width pruning to achieve significant reductions in model size. Depth pruning, which involves the removal of entire layers from a ViT, is notoriously difficult for accuracy recovery, although depth pruning usually leads to higher speedups of compressed ViTs. Consequently, existing joint approaches that incorporate both width and depth pruning have exhibited limited acceleration ratios due to the inefficiencies of previous depth pruning methods.
To tackle the challenges in depth pruning, this work introduces BoundaryDPT, a novel depth pruning method by targeting redundancy of both attention layers and non-linearity within ViTs. To the
best of our knowledge, we are the first to propose the pruning of activation function layers in ViTs. By reducing the redundancy of nonlinearity, instead of directly targeting linear layers in ViTs, the depths of ViTs are naturally reduced without incurring dimension mismatch. Moreover, we present a two-stage joint pruning method designed to address the heterogeneity of attention layers and activation function layers.
Comprehensive experiments on ImageNet1k, CIFAR-100, and ADE20K have validated our methods. Firstly, BoundaryDPT achieves a 1.58$\times$ speedup for DeiT-B while maintaining accuracy, and a 1.39$\times$ speedup for DeiT-S with nearly lossless accuracy degradation. Furthermore, when combined with width pruning (referred to as BoundaryDPT+), our method sets a new state-of-the-art record in ViT pruning. For instance, BoundaryDPT+ enhances the acceleration ratio from 4.24$\times$ to 5.19$\times$ for the Isomorphic-Pruning-2.6G configuration while maintaining near-lossless accuracy, establishing new benchmarks in extreme ViT compression.
Supplementary Material: zip
Primary Area: infrastructure, software libraries, hardware, systems, etc.
Submission Number: 6030
Loading