PASS: Pruning Attention Heads with Almost-sure Sparsity Targets

Published: 16 Sept 2024, Last Modified: 17 Sept 2024Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Transformer models have been widely used to obtain high accuracy values in multiple fields including natural language processing (NLP), computer vision, and more. This superior performance typically comes at the expense of substantial computational overhead. Multi-head attention is the key factor in the success of Transformer models that has been found to be computationally expensive. Significant research effort has been devoted to improving attention compute efficiency by pruning redundant attention heads. A widely adopted paradigm is to jointly learn a set of gate variables and apply thresholds on gate values to prune heads. Previous work shows a high level of sensitivity to threshold tuning which can limit subnetwork performance and prevent them from wider adoption in practice. We propose the notion of almost-sure sparsity to overcome this limitation and develop a generic framework for Pruning with Almost-Sure Sparsity (PASS) targets over attention heads. To further boost efficiency, we design a novel technique, concentrator, based on which we develop PASSCONC (PASS with CONCentrator). We also present a simple-yet-effective strategy to further improve subnetwork performance by clipping and selectively reopening learned gates. We investigate PASS and PASSCONC on two widely studied architectures: encoder-decoder (ED) Transformer and encoder-only Transformer (e.g., BERT-base). Experiments on IWSLT14 German-to-English translation and GLUE benchmark tasks demonstrate that our approaches outperform the SOTA by achieving up to 1.33 higher BLEU scores, 1.44% higher accuracy, and 60% higher attention speedups.
Submission Length: Regular submission (no more than 12 pages of main content)
Previous TMLR Submission Url: https://openreview.net/forum?id=AzuyAJ545D
Changes Since Last Submission: Addressed paper format issues (e.g., font) to ensure adherence to format requirements.
Assigned Action Editor: ~Marwa_El_Halabi1
Submission Number: 2868
Loading