All-but-the-Top: Simple and Effective Postprocessing for Word Representations

Jiaqi Mu; Pramod Viswanath

All-but-the-Top: Simple and Effective Postprocessing for Word Representations

Jiaqi Mu, Pramod Viswanath

15 Feb 2018 (modified: 22 Jun 2025)ICLR 2018 Conference Blind SubmissionReaders: Everyone

Abstract: Real-valued word representations have transformed NLP applications; popular examples are word2vec and GloVe, recognized for their ability to capture linguistic regularities. In this paper, we demonstrate a {\em very simple}, and yet counter-intuitive, postprocessing technique -- eliminate the common mean vector and a few top dominating directions from the word vectors -- that renders off-the-shelf representations {\em even stronger}. The postprocessing is empirically validated on a variety of lexical-level intrinsic tasks (word similarity, concept categorization, word analogy) and sentence-level tasks (semantic textural similarity and text classification) on multiple datasets and with a variety of representation methods and hyperparameter choices in multiple languages; in each case, the processed representations are consistently better than the original ones.

Code: [![Papers with Code](/images/pwc_icon.svg) 4 community implementations](https://paperswithcode.com/paper/?openreview=HkuGJ3kCb)

Data: [IMDb Movie Reviews](https://paperswithcode.com/dataset/imdb-movie-reviews), [MR](https://paperswithcode.com/dataset/mr), [SICK](https://paperswithcode.com/dataset/sick), [SST](https://paperswithcode.com/dataset/sst), [SST-5](https://paperswithcode.com/dataset/sst-5), [SUBJ](https://paperswithcode.com/dataset/subj)

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 6 code implementations](https://www.catalyzex.com/paper/all-but-the-top-simple-and-effective/code)

7 Replies

Loading