Abstract: Sentiment analysis is a common and challenging task in natural language processing (NLP). It is a widely studied area of research; it facilitates capturing public opinions about a topic, product, or service. There is much research that tackles English sentiment analysis. However, the research in the Arabic language is behind other high-resource languages. Recently, models such as bidirectional encoder representations from transformers (BERT) and generative pre-trained transformer (GPT) have been widely used in many NLP tasks; it significantly improved performance in NLP tasks, especially sentiment analysis. However, Arabic was not a priority in their development. Several models focusing on Arabic have recently begun to pave the way for the latest technologies, such as ARBERT, MARBERT, and others. We used multiple datasets for training and testing-ASAD-A Twitter-based Benchmark Arabic Sentiment Analysis Dataset, ArSarcasm-v2, and SemEval-2017. We propose an ensemble learning approach that combines the multilingual model(XLM-T) and the monolingual model(MARBERT) to overcome the intricacies of the Arabic language that are difficult to address with a single model. It also addresses the problem of imbalanced data using a combination of focal loss and label smoothing. The experiments showed that our ensemble learning approach outperforms the state-of-the-art models on all the used datasets.
0 Replies
Loading