Wasserstein is all you need

Sidak Pal Singh, Andreas Hug, Aymeric Dieuleveut, Martin Jaggi

05 Jun 2018 (modified: 21 Feb 2020)OpenReview Anonymous Preprint Blind SubmissionReaders: Everyone

Abstract: We propose a unified framework for building unsupervised representations of individual objects or entities (and their compositions), by associating with each object both a distributional as well as a point estimate (vector embedding). This is made possible by the use of optimal transport, which allows us to build these associated estimates while harnessing the underlying geometry of the ground space. Our method gives a novel perspective for building rich and powerful feature representations that simultaneously capture uncertainty (via a distributional estimate) and interpretability (with the optimal transport map). As a guiding example, we formulate unsupervised representations for text, in particular for sentence representation and entailment detection. Empirical results show strong advantages gained through the proposed framework. This approach can be used for any unsupervised or supervised problem (on text or other modalities) with a co-occurrence structure, such as any sequence data. The key tools underlying the framework are Wasserstein distances and Wasserstein barycenters (and, hence the title!). Please refer to https://arxiv.org/abs/1808.09663 for the latest version.

Keywords: representation learning, wasserstein distance, wasserstein barycenter, optimal transport, entailment, NLP

TL;DR: Represent each entity based on its histogram of contexts and then Wasserstein is all you need!

0 Replies