EmoBench-UA: A Benchmark Dataset for Emotion Detection in Ukrainian

ACL ARR 2025 May Submission3752 Authors

19 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: While Ukrainian NLP has seen progress in many texts processing tasks, emotion classification remains an underexplored area with no publicly available benchmark to date. In this work, we introduce **EmoBench-UA**, the first annotated dataset for emotion detection in Ukrainian texts. Our annotation schema is adapted from the previous English-centric works on emotion detection (Mohammad et al., 2018; Mohammad, 2022) guidelines. The dataset was created through crowdsourcing using the Toloka.ai platform ensuring high-quality of the annotation process. Then, we evaluate a range of approaches on the collected dataset, starting from linguistic-based baselines, synthetic data translated from English, to large language models (LLMs). Our findings highlight the challenges of emotion classification in non-mainstream languages like Ukrainian and emphasize the need for further development of Ukrainian-specific models and training resources.
Paper Type: Long
Research Area: Resources and Evaluation
Research Area Keywords: corpus creation, benchmarking, datasets for low resource languages
Contribution Types: Approaches to low-resource settings, Data resources
Languages Studied: Ukrainian
Submission Number: 3752
Loading