- Keywords: word embedding, natural language processing
- TL;DR: Learn Interpretable Word Embeddings Efficiently with von Mises-Fisher Distribution
- Abstract: Word embedding plays a key role in various tasks of natural language processing. However, the dominant word embedding models don't explain what information is carried with the resulting embeddings. To generate interpretable word embeddings we intend to replace the word vector with a probability density distribution. The insight here is that if we regularize the mixture distribution of all words to be uniform, then we can prove that the inner product between word embeddings represent the point-wise mutual information between words. Moreover, our model can also handle polysemy. Each word's probability density distribution will generate different vectors for its various meanings. We have evaluated our model in several word similarity tasks. Results show that our model can outperform the dominant models consistently in these tasks.