Deep Learning Based Sentiment Analysis in a Code-Mixed English-Hindi and English-Bengali Social Media Corpus

Anupam Jamatia, Steve Durairaj Swamy, Björn Gambäck, Amitava Das, Swapan Debbarma

Published: 2020, Last Modified: 13 Nov 2024Int. J. Artif. Intell. Tools 2020EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Sentiment analysis is a circumstantial analysis of text, identifying the social sentiment to better understand the source material. The article addresses sentiment analysis of an English-Hindi and English-Bengali code-mixed textual corpus collected from social media. Code-mixing is an amalgamation of multiple languages, which previously mainly was associated with spoken language. However, social media users also deploy it to communicate in ways that tend to be somewhat casual. The coarse nature of social media text poses challenges for many language processing applications. Here, the focus is on the low predictive nature of traditional machine learners when compared to Deep Learning counterparts, including the contextual language representation model BERT (Bidirectional Encoder Representations from Transformers), on the task of extracting user sentiment from code-mixed texts. Three deep learners (a BiLSTM CNN, a Double BiLSTM and an Attention-based model) attained accuracy 20–60% greater than traditional approaches on code-mixed data, and were for comparison also tested on monolingual English data.