Decoding Decoders: Finding Optimal Representation Spaces for Unsupervised Similarity Tasks


Nov 03, 2017 (modified: Nov 03, 2017) ICLR 2018 Conference Blind Submission readers: everyone Show Bibtex
  • Abstract: Experimental evidence indicates that shallow bag-of-words models outperform complex deep networks on many unsupervised similarity tasks. Introducing the concept of an optimal representation space, we provide a simple theoretical resolution to this apparent paradox. In addition, we present a straightforward procedure that, without any retraining or architectural modifications, allows deep recurrent models to perform equally well (and sometimes better) when compared to shallow models. To validate our analysis, we conduct a set of consistent empirical evaluations and introduce several new sentence embedding models in the process. While the current work is presented within the context of natural language processing, the insights are applicable to the entire field of representation learning.
  • TL;DR: By introducing the notion of an optimal representation space, we provide a theoretical argument and experimental validation that an unsupervised model for sentences can perform well on both supervised similarity and unsupervised transfer tasks.
  • Keywords: distributed representations, sentence embedding, representation learning, unsupervised learning, encoder-decoder, RNN