Abstract: The proliferation of offensive online content across diverse languages necessitates culturally-aware NLP solutions. While Cross-Lingual Transfer Learning (CLTL) shows promise in other NLP tasks, its application to offensive language detection overlooks crucial cultural nuances in how offensiveness is perceived. This work investigates the effectiveness of CLTL for offensive language detection, considering both linguistic and cultural factors. Specifically, we investigated transfer learning across 105 language pairs, and uncovered several key findings. Firstly, training exclusively on English data impedes performance in certain target languages. Secondly, linguistic proximity between languages does not have a significant impact on transferability. Lastly, there is a significant correlation between cultural distance and performance. Importantly, for each unit increase of cultural distance, there was an increase of 0.31 in the AUC. These findings emphasize the limitations of English-centric approaches and highlight the need to integrate cultural context into NLP solutions for offensive language detection.
Paper Type: short
Research Area: Computational Social Science and Cultural Analytics
Contribution Types: NLP engineering experiment, Approaches to low-resource settings, Data analysis, Position papers
Languages Studied: Albanian, Arabic, Danish, English, Estonian, German, Greek, Italian, Latvian, Portuguese, Russian, Turkish, Surzhyk, Chinese, Hindi
Preprint Status: We plan to release a non-anonymous preprint in the next two months (i.e., during the reviewing process).
A1: yes
A1 Elaboration For Yes Or No: section 7
A2: yes
A2 Elaboration For Yes Or No: section 7
A3: yes
A3 Elaboration For Yes Or No: abstract and section 1
B: yes
B1: yes
B1 Elaboration For Yes Or No: section 3
B2: yes
B2 Elaboration For Yes Or No: appendix D
B3: yes
B3 Elaboration For Yes Or No: appendix D
B4: n/a
B4 Elaboration For Yes Or No: We use publicly available datasets for offensive language detection
B5: yes
B5 Elaboration For Yes Or No: appendix D
B6: yes
B6 Elaboration For Yes Or No: Section 3 and appendix D
C: yes
C1: yes
C1 Elaboration For Yes Or No: Appendix A
C2: yes
C2 Elaboration For Yes Or No: Section 3 and appendix A
C3: yes
C3 Elaboration For Yes Or No: Section 3, 4, 5 and appendix E
C4: yes
C4 Elaboration For Yes Or No: section 3 and appendix A
D: no
D1: n/a
D2: n/a
D3: n/a
D4: n/a
D5: n/a
E: yes
E1: n/a
E1 Elaboration For Yes Or No: We used chatgpt to capture grammatical errors
0 Replies
Loading