Tiny Attention: A Simple yet Effective Method for Learning Contextual Word EmbeddingsDownload PDF

16 Mar 2023 (modified: 30 Apr 2023)Submitted to Tiny Papers @ ICLR 2023Readers: Everyone
Keywords: contextual embeddings, syntagmatic associations, attention mechanism
TL;DR: A simple method to learn attention based contextual word embeddings using just SVD!
Abstract: Contextual Word Embedding (CWE) obtained via the Attention Mechanism in Transformer (AMT) models is one of the key drivers of the current revolution in Natural Language Processing. Previous techniques for learning CWEs are not only inferior to AMT but also are largely subpar to the simple bag-of-words baseline. Though there have been many variants of the Transformer model, the attention mechanism itself remains unchanged and is largely opaque. We introduce a new method for leaning CWEs that uses a simple and transparent attention mechanism. Our method is derived from the SVD based Syntagmatic Word Embeddings, which capture word associations. We test our model on the Word-in-Context dataset, and show that it outperforms the simple but tough-to-beat baseline by a substantial margin.
5 Replies