Abstract: Learning word representations from large available corpora relies on the distributional hypothesis that words present in similar contexts tend to have similar meanings. Recent work has shown that word representations learnt in this manner lack sentiment information which, fortunately, can be leveraged using external knowledge. Our work addresses the question: can affect lexica improve the word representations learnt from a corpus? In this work, we propose techniques to incorporate affect lexica, which capture fine-grained information about a word's psycholinguistic and emotional orientation, into the training process of Word2Vec SkipGram, Word2Vec CBOW and GloVe methods using a joint learning approach. We use affect scores from Warriner's affect lexicon to regularize the vector representations learnt from an unlabelled corpus. Our proposed method outperforms previously proposed methods on standard tasks for word similarity detection, outlier detection and sentiment detection. We also demonstrate the usefulness of our approach for a new task related to the prediction of formality, frustration and politeness in corporate communication.
TL;DR: Enriching word embeddings with affect information improves their performance on sentiment prediction tasks.
Keywords: Affect lexicon, word embeddings, Word2Vec, GloVe, WordNet, joint learning, sentiment analysis, word similarity, outlier detection, affect prediction
Data: [SST](https://paperswithcode.com/dataset/sst)
8 Replies
Loading