MarathiEmoExplain: A Dataset for Sentiment, Emotion, and Explanation in Low-Resource Marathi

ACL ARR 2025 May Submission5476 Authors

20 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Marathi, the third most widely spoken language in India with over 83 million native speakers, remains significantly underrepresented in Natural Language Processing (NLP) research. While sentiment analysis has achieved substantial progress in high-resource languages such as English, Chinese, and Hindi, available Marathi datasets are limited to coarse sentiment labels and lack fine-grained emotional categorization or interpretability through explanations. To address this gap, we present a new annotated dataset of 10,762 Marathi sentences, each labeled with sentiment (positive, negative, or neutral), emotion (joy, anger, surprise, disgust, sadness, fear, or neutral), and a corresponding natural language justification. Justifications are written in English and generated using GPT-4 under a human-in-the-loop framework to ensure label fidelity and contextual alignment. Extensive experiments with both classical and transformer-based models demonstrate the effectiveness of the dataset for interpretable affective computing in a low-resource language setting, offering a benchmark for future research in multilingual and explainable NLP.
Paper Type: Short
Research Area: Efficient/Low-Resource Methods for NLP
Research Area Keywords: Interpretability and Explainability, Low-resource Learning, Emotion Recognition, Sentiment Analysis, Resources and Evaluation
Contribution Types: Data resources, Data analysis
Languages Studied: Marathi, English
Submission Number: 5476
Loading