Head Information Bottleneck: An Evaluation Method for Transformer Head Contributions in Speech Task

23 Sept 2023 (modified: 25 Mar 2024)ICLR 2024 Conference Withdrawn SubmissionEveryoneRevisionsBibTeX
Keywords: Attribution, Informational Bottleneck, Multi-Head Attention, Explainable AI
TL;DR: We apply the Variational Information Bottleneck to the Multi-Head Attention mechanism.
Abstract: Multi-head attention mechanisms have been widely applied in speech pre-training. However, their roles and effectiveness in various downstream tasks have not been fully studied. Different attention heads may exhibit varying degrees of importance in different downstream tasks. We noticed that the attention allocation in the attention mechanism is similar to the information bottleneck, aiming to highlight the parts important to the task. Therefore, we introduced the information bottleneck into multi-head attention to estimate the degree of mutual information contained in each attention head's output about the input and forced it to focus on useful information. Additionally, we proposed a method to measure the contribution of attention heads in tasks. We also pruned the model heads according to their contributions, providing an interpretable direction for model pruning. Notably, our method can maintain an accuracy of 83.36% on the KS task while pruning 40% of the heads.
Supplementary Material: zip
Primary Area: visualization or interpretation of learned representations
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 7938
Loading