An Evaluation Dataset for Targeted Sentiment Analysis in Long-Form Chinese News Articles

Rui Chen, Tailai Peng, Xinran Xie, Dekun Lin, Zhe Cui, Zheng Chen

Published: 2024, Last Modified: 13 Nov 2024ICANN (7) 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Compared to the prosperity of review domain with high-quality data for robust model evaluation, datasets from news domain are relatively scarce, and each dedicates to singular news subdomains for the Targeted Sentiment Analysis (TSA) task. This limitation hinders cross-domain evaluation, particularly for long-form Chinese news. Additionally, conventional TSA datasets are too brief, leading to the neglect of possible changes of target sentiment in lengthy texts. To address this gap, we propose a scheme to annotate sentiments towards targets in a quantitative way from a full-text perspective. Then, we introduce CNTSenti, a long-form Chinese news evaluation dataset, comprising 2,589 articles across five subfields, with an average length of 1,172 words. In addition, a domain adaptation strategy is presented to enhance the transfer of features across domains, incorporating target-guided windows, a prompt-based sentiment distribution alignment loss function, and a feature transferring mechanism utilizing contrastive learning. Extensive experiments have demonstrated the effectiveness of our approach and the challenging nature of the CNTSenti dataset.