- Keywords: Pretrained Word Representations, Lightweight Representations, NLP, Social Bias, Word Embeddings
- TL;DR: A procedure for distilling contextual models into static embeddings; we apply our method to 9 popular models and demonstrate clear gains in representation quality wrt Word2Vec/GloVe and improved analysis potential by thoroughly studying social bias.
- Abstract: Contextualized word representations such as ELMo and BERT have become the de facto starting point for incorporating pretrained representations for downstream NLP tasks. In these settings, contextual representations have largely made obsolete their static embedding predecessors such as Word2Vec and GloVe. However, static embeddings do have their advantages in that they are straightforward to understand and faster to use. Additionally, embedding analysis methods for static embeddings are far more diverse and mature than those available for their dynamic counterparts. In this work, we introduce simple methods for generating static lookup table embeddings from existing pretrained contextual representations and demonstrate they outperform Word2Vec and GloVe embeddings on a variety of word similarity and word relatedness tasks. In doing so, our results also reveal insights that may be useful for subsequent downstream tasks using our embeddings or the original contextual models. Further, we demonstrate the increased potential for analysis by applying existing approaches for estimating social bias in word embeddings. Our analysis constitutes the most comprehensive study of social bias in contextual word representations (via the proxy of our distilled embeddings) and reveals a number of inconsistencies in current techniques for quantifying social bias in word embeddings. We publicly release our code and distilled word embeddings to support reproducible research and the broader NLP community.
- Code: https://github.com/AnonymousICLR2020Submission/BERT-Wears-GloVes