Culture Matters in Toxic Language Detection in Persian

ACL ARR 2025 February Submission916 Authors

11 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Toxic language detection is crucial for creating safer online environments and limiting the spread of harmful content. While toxic language detection has been under-explored in Persian, the current work compares different methods for this task, including fine-tuning, data enrichment, zero-shot and few-shot learning, and cross-lingual transfer learning. What is especially compelling is the impact of cultural context on transfer learning for this task: We show that the language of a country with cultural similarities to Persian yields better results in transfer learning. Conversely, the improvement is lower when the language comes from a culturally distinct country.
Paper Type: Long
Research Area: Computational Social Science and Cultural Analytics
Research Area Keywords: Toxic Language Detection, Distant supervision, Low Resource Language, Large language models (LLMs), Transfer learning, Cross-cultural NLP
Contribution Types: Approaches to low-resource settings
Languages Studied: Persian (Farsi), Arabic, English, Indonesian
Submission Number: 916
Loading