Pruning Attention Heads with Almost-sure Sparsity Targets

23 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: general machine learning (i.e., none of the above)
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: Transformer, Multi-head Attention, Model Pruning
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Abstract: Transformer-based architectures have been widely used to obtain high accuracy values in multiple fields including natural language processing (NLP), computer vision, and more. Multi-head attention is the key factor in the success of Transformer-based architectures that has been found to be computationally expensive. Significant research effort has been devoted to improve attention compute efficiency by reducing the self-attention complexity or pruning redundant attention heads. Previous pruning work either presents training-testing inconsistency or enforces hard structural constraints which limit model performance. We propose the notion of almost-sure sparsity to overcome these limitations and develop a generic framework for Pruning with Almost-Sure Sparsity (PASS) targets over attention heads. To further boost efficiency, we design a novel technique, concentrator, based on which we develop PASSCONC (PASS with CONCentrator). We investigate PASS and PASSCONC on two widely studied architectures: encoder-decoder (ED) Transformer and BERT. Experiments on IWSLT14 German-to-English translation and GLUE benchmark tasks demonstrate that our approaches outperform the SOTA by up to 1.33 higher BLEU scores, 1.44% higher accuracy, and 60% higher attention layer speedups.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 7325
Loading