Abstract: In this study, we present the first large dataset, ThatiAR, for subjectivity detection in Arabic, consisting of ~3.6K manually annotated sentences, and GPT-4o based explanations. In addition, we include instructions (both in English and Arabic) to facilitate LLM based fine-tuning. We provide an in-depth analysis of the dataset, annotation process, and extensive benchmark results, including PLMs and LLMs. Our analysis of the annotation process highlights that annotators were strongly influenced by their political, cultural, and religious backgrounds, especially at the beginning of the annotation process. The experimental results suggest that LLMs with in-context learning provide better performance. We release the dataset and resources to the community.
External IDs:dblp:conf/icwsm/SuwailehHHZA25
Loading