Open Peer Review. Open Publishing. Open Access. Open Discussion. Open Directory. Open Recommendations. Open API. Open Source.
Learning Document Embeddings by Predicting N-grams for Sentiment Classification of Long Movie Reviews
Bofang Li, Tao Liu, Xiaoyong Du, Deyuan Zhang, Zhe Zhao
Feb 08, 2016 (modified: Feb 08, 2016)ICLR 2016 workshop submissionreaders: everyone
Abstract:Bag-of-ngram based methods still achieve state-of-the-art results for tasks such as sentiment classification of long movie reviews, though semantic information is partially lost for these methods. Many document embeddings methods have been proposed to capture semantics, but they still can't outperform bag-of-ngram based methods on this task. In this paper, we modify the architecture of the recently proposed Paragraph Vector, allowing it to learn document vectors by predicting not only words, but n-gram features as well. Our model is able to capture both semantics and word order in documents while keeping the expressive power of learned vectors. Experimental results on IMDB movie review dataset show that our model outperforms previous deep learning models and bag-of-ngram based models due to the above advantages.
Enter your feedback below and we'll get back to you as soon as possible.