Open Peer Review. Open Publishing. Open Access. Open Discussion. Open Directory. Open Recommendations. Open API. Open Source.
ON MODELING HIERARCHICAL DATA VIA ENCAPSULATION OF PROBABILITY DENSITIES
Nov 07, 2017 (modified: Nov 07, 2017)ICLR 2018 Conference Blind Submissionreaders: everyoneShow Bibtex
Abstract:By representing words with probability densities rather than point vectors, probabilistic word embeddings can capture rich and interpretable semantic information and uncertainty (Vilnis & McCallum, 2014; Athiwaratkun & Wilson, 2017). For example, such embeddings trained on an unlabelled corpus can represent lexical entailment, where the learned distribution for a general concept such as “animal” can contain the distributions for more specific words such as “dog” or “cat”. However, for some words such as “mammal”, the entailment signal is often weak since “mammal” is usually not used in place of ”dog” or “cat” in natural sentences. In this paper, we develop the use of density representations to specifically model hierarchical data. We introduce simple yet effective loss functions and distance metrics, as well as graph-based schemes to select negative samples to better learn hierarchical probabilistic representations. Our methods outperform the original methodology proposed by Vilnis & McCallum (2014) by a significant margin and provide state-of-the-art performance on the WORDNET hypernym relationship prediction task and the challenging HYPERLEX lexical entailment dataset.
Enter your feedback below and we'll get back to you as soon as possible.