Effects of Pre-trained Word Embeddings on Text-based Deception Detection

David Nam; Jerin Yasmin; Farhana H. Zulkernine

Effects of Pre-trained Word Embeddings on Text-based Deception Detection

David Nam, Jerin Yasmin, Farhana H. Zulkernine

Published: 01 Jan 2020, Last Modified: 11 Feb 2025DASC/PiCom/CBDCom/CyberSciTech 2020EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: With e-commerce transforming the way in which individuals and businesses conduct trades, online reviews have become a great source of information among consumers. With 93% of shoppers relying on online reviews to make their purchasing decisions, the credibility of reviews should be strongly considered. While detecting deceptive text has proven to be a challenge for humans to detect, it has been shown that machines can be better at distinguishing between truthful and deceptive online information by applying pattern analysis on a large amount of data. In this work, we look at the use of several popular pre-trained word embeddings (Word2Vec, GloVe, fastText) with deep neural network models (CNN, BiLSTM, CNN-BiLSTM) to determine the influence of word embedding on the accuracy of detecting deception. Some pre-trained word embeddings have shown to adversely affect the classification accuracy when compared to training the model on text embedding using the domain specific data. Through the combination of CNN and BiLSTM along with the fastText pre-trained word embeddings, we were able to achieve an accuracy of 88.8 percent on the hotel review dataset published by Ott et al. in 2011.

Loading