Don’t Bet on Sparsity: Designing Brain-inspired Distance-preserving EncoderDownload PDF

22 Sept 2022 (modified: 13 Feb 2023)ICLR 2023 Conference Withdrawn SubmissionReaders: Everyone
Keywords: Orthogonal attention, Lipschitz, Entropic Transformer
Abstract: Multi-headed self-attention-based Transformers have been a central area of research for quite some time. Albeit showing a significant improvement in understanding short-term and long-term contexts from sequences, encoders of Transformer and its variants fail to preserve layer-wise contextual information. Further, text representations learned by Transformer-based encoders are usually of low entropy with low variance, which contradicts typical human brain functions. In this work, we propose TransJect, an encoder model that guarantees a theoretical bound for layer-wise distance preservation between any pair of tokens. We propose a simple alternative to dot product attention to ensure Lipschitz continuity that allows TransJect to learn injective mappings to transform token representations to different manifolds and preserve Euclidean distance between every pair of tokens in subsequent layers. Our evaluation on several benchmark short- and long-sequence classification tasks shows a remarkable improvement of 3.1% and 11%, on average, respectively. Furthermore, empirical results suggest that TransJect is layer-agnostic; in fact, it prefers shallower architectures than deeper ones and prevents layer-wise incremental learning beyond a threshold. Our empirical analyses also show the generalization capabilities of TransJect and the robustness under different hyperparameter configurations. We conduct detailed statistical analysis to confirm the necessity of high-entropic representations to achieve human-like cognition.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Submission Guidelines: Yes
Please Choose The Closest Area That Your Submission Falls Into: Deep Learning and representational learning
Supplementary Material: zip
5 Replies

Loading