Challenges and Solutions with Alignment and Enrichment of Word Embedding Models

Published: 01 Jan 2017, Last Modified: 20 May 2025NLDB 2017EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Word embedding models offer continuous vector representations that can capture rich semantics of word co-occurrence patterns. Although these models have improved the state-of-the-art on a number of nlp tasks, many open research questions remain. We study the semantic consistency and alignment of these models and show that their local properties are sensitive to even slight variations in the training datasets and parameters. We propose a solution that improves alignment of different word embedding models by leveraging carefully generated synthetic data points. Our approach leads to substantial improvements in recovering consistent and richer embeddings of local semantics.
Loading