Abstract: Natural language understanding (NLU) is a task that enables machines to understand human language.
Some tasks, such as stance detection and sentiment analysis, are closely related to individual subjective perspectives, thus termed individual-level NLU.
Previously, these tasks are often simplified to text-level NLU tasks, ignoring individual factors.
This not only makes inference difficult and unexplainable but often results in a large number of label errors when creating datasets.
To address the above limitations, we propose a new NLU annotation guideline based on individual-level factors.
Specifically, we incorporate other posts by the same individual and then annotate individual subjective perspectives after considering all individual posts. We use this guideline to expand and re-annotate the stance detection and topic-based sentiment analysis datasets. We find that error rates in the samples were as high as 31.7\% and 23.3\%.
We further use large language models to conduct experiments on the re-annotation dataset and find that the large language model perform well on both datasets after adding individual factors. Both GPT-4o and Llama3-70B can achieve an accuracy greater than 87\% on the new dataset. We also verify the effectiveness of individual factors through ablation studies. We call on future researchers to add individual factors when creating such datasets. Our re-annotation dataset can be found at https://anonymous.4open.science/r/Individual-NLU-A0DE.
Paper Type: Long
Research Area: Computational Social Science and Cultural Analytics
Research Area Keywords: stance detection; emotion detection and analysis
Contribution Types: Data resources, Data analysis
Languages Studied: English
Submission Number: 5253
Loading