Enhancing the Nonlinear Mutual Dependencies in Transformers with Mutual InformationDownload PDF

Anonymous

16 Nov 2021 (modified: 05 May 2023)ACL ARR 2021 November Blind SubmissionReaders: Everyone
Abstract: The Predictive Uncertainty problem exists in Transformers. We present that pre-trained Transformers can be further regularized by mutual information to alleviate such issue in Neural Machine Translation (NMT). In this paper, we explicitly capture the nonlinear mutual dependencies existing in two types of attentions in decoder to reduce the model uncertainty concerning token-token interactions. Specifically, we adopt an unsupervised objective of mutual information maximization on self-attentions with the contrastive learning methodology and construct the estimation of mutual information by using InfoNCE. Experimental results on WMT'14 En$\rightarrow$De, WMT'14 En$\rightarrow$Fr demonstrate the consistent effectiveness and evident improvements of our model over the strong baselines. Quantifying the model uncertainty again verifies our hypothesis. The proposed plug-and-play approach can be easily incorporated and deployed into pre-trained Transformer models. Code will be released soon.
0 Replies

Loading