Hierarchical Density Order Embeddings

Ben Athiwaratkun; Andrew Gordon Wilson

Hierarchical Density Order Embeddings

Ben Athiwaratkun, Andrew Gordon Wilson

15 Feb 2018 (modified: 22 Jun 2025)ICLR 2018 Conference Blind SubmissionReaders: Everyone

Abstract: By representing words with probability densities rather than point vectors, proba- bilistic word embeddings can capture rich and interpretable semantic information and uncertainty (Vilnis & McCallum, 2014; Athiwaratkun & Wilson, 2017). The uncertainty information can be particularly meaningful in capturing entailment relationships – whereby general words such as “entity” correspond to broad distributions that encompass more specific words such as “animal” or “instrument”. We introduce density order embeddings, which learn hierarchical representations through encapsulation of probability distributions. In particular, we propose simple yet effective loss functions and distance metrics, as well as graph-based schemes to select negative samples to better learn hierarchical probabilistic representations. Our approach provides state-of-the-art performance on the WordNet hypernym relationship prediction task and the challenging HyperLex lexical entailment dataset – while retaining a rich and interpretable probabilistic representation.

Keywords: embeddings, word embeddings, probabilistic embeddings, hierarchical representation, probabilistic representation, order embeddings, wordnet, hyperlex

Code: [![github](/images/github_icon.svg) benathi/density-order-emb](https://github.com/benathi/density-order-emb) + [![Papers with Code](/images/pwc_icon.svg) 1 community implementation](https://paperswithcode.com/paper/?openreview=HJCXZQbAZ)

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 2 code implementations](https://www.catalyzex.com/paper/hierarchical-density-order-embeddings/code)

Data: [HyperLex](https://paperswithcode.com/dataset/hyperlex)

10 Replies

Loading