CluSent - Combining Semantic Expansion and De-Noising for Dataset-Oriented Sentiment Analysis of Short Texts

Felipe Viegas, Sérgio D. Canuto, Washington Cunha, Celso França, Cláudio Moisés Valiense de Andrade, Leonardo Rocha, Marcos André Gonçalves

Published: 2023, Last Modified: 15 Feb 2024WebMedia 2023Readers: Everyone

Abstract: The lack of sufficient information, mainly in short texts, is a major challenge to building effective sentiment models. Short texts can be enriched with more complex semantic relationships that better capture affective information, with a potential undesired side effect of noise introduced into the data. This work proposes a new strategy for customized dataset-oriented sentiment analysis – CluSent – that exploits a powerful, recently proposed concept for representing semantically related words – CluWords. CluSent tackles the issues mentioned above of information shortage and noise by: (i) exploiting the semantic neighborhood of a given pre-trained word embedding to enrich document representation and (ii) introducing dataset-oriented filtering and weighting mechanisms to cope with noise, which takes advantage of the polarity and intensity information from lexicons. In our experimental evaluation, considering 19 datasets, five state-of-the-art baselines (including modern transformer architectures), and two metrics, CluSent was the best method in 30 out of 38 possibilities, with significant gains over the strongest baselines (over 14%).

0 Replies