AdaPrune: Pruning Transformer with Sparse Regularization

Anonymous

AdaPrune: Pruning Transformer with Sparse Regularization

Anonymous

16 Nov 2021 (modified: 05 May 2023)ACL ARR 2021 November Blind SubmissionReaders: Everyone

Abstract: The key component of transformer architecture is the multi-head self-attention(MHA) and feed forward neural network (FFN). In this paper, we reveal that, across many applications, MHA component is nonsymmetric and FFN component is sparse. Leveraging this observation, we propose a new method, AdaPrune, to utilize sparse regularization to conduct structure pruning in MHA and FFN modules. This method selects task-specific valuable heads in multi-head attention modules and effective blocks in feed forward layers during the fine-tuning stage, while maintaining the original performance of full transformer model. Extensive experiments show that AdaPrune can achieve competitive performance on these tasks while significantly reduce the computation cost.

0 Replies

Loading