Is Wasserstein all you need?

Anonymous

Sep 27, 2018 (modified: Oct 10, 2018) ICLR 2019 Conference Blind Submission readers: everyone Show Bibtex
  • Abstract: We propose a unified framework for building unsupervised representations of entities and their compositions, by viewing each entity as a histogram over its contexts. This enables us to take advantage of optimal transport and construct representations that effectively harness the geometry of the underlying space containing the contexts. Our method captures uncertainty via modelling the entities as distributions and simultaneously provides interpretability with the optimal transport map, hence giving a novel perspective for building rich and powerful feature representations. As a guiding example, we formulate unsupervised representations for text, and demonstrate it on tasks such as sentence similarity and word entailment detection. Empirical results show strong advantages gained through the proposed framework. This approach can be used for any unsupervised or supervised problem (on text or other modalities) with a co-occurrence structure, such as any sequence data. The key tools at the core of this framework are Wasserstein distances and Wasserstein barycenters, hence raising the question from our title.
0 Replies

Loading