CluSent – Combining Semantic Expansion and De-Noising for Dataset-Oriented Sentiment Analysis of Short Texts

Anonymous

CluSent – Combining Semantic Expansion and De-Noising for Dataset-Oriented Sentiment Analysis of Short Texts

Anonymous

16 Nov 2021 (modified: 05 May 2023)ACL ARR 2021 November Blind SubmissionReaders: Everyone

Abstract: The lack of sufficient information, mainly in short texts, is a major challenge to building effective sentiment models. Short texts can be enriched with more complex semantic relationships that can better capture affective information, with a potential undesired side effect of noise introduced into the data. In this work, we propose a new strategy for customized dataset-oriented sentiment analysis -- CluSent -- that exploits a powerful, recently proposed concept for representing semantically related words -- CluWords. CluSent tackles the issues mentioned above of information shortage and noise by: (i) exploiting the semantic neighborhood of a given pre-trained word embedding to enrich document representation, and (ii) introducing dataset-oriented filtering and weighting mechanisms to cope with noise, which take advantage of the polarity and intensity information from lexicons. In our experimental evaluation, considering 19 datasets, 5 state-of-the-art baselines (including modern transformer architectures) and two metrics, CluSent was the best method in 30 out of 38 possibilities, with significant gains over the strongest baselines (over 14%).

0 Replies

Loading