Abstract: Despite remarkable advances in natural language processing, developing effective systems for low-resource languages remains a formidable challenge, with performance typically lagging far behind high-resource counterparts due to data scarcity and insufficient linguistic resources. Cross-lingual knowledge transfer has emerged as a promising approach to address this challenge by leveraging resources from high-resource languages. In this paper, we investigate methods for transferring linguistic knowledge from high-resource languages to low-resource languages, where the number of labeled training instances is in hundreds. We focus on sentence-level and word-level tasks. We examine three approaches for cross-lingual knowledge transfer: (a) augmentation in hidden layers, (b) token embedding transfer through token translation, and (c) a novel method for sharing token embeddings at hidden layers using Graph Neural Networks. Experimental results on sentiment classification and NER tasks on low-resource languages Marathi, Bangla (Bengali) and Malayalam using high-resource languages Hindi and English demonstrate that our novel GNN-based approach significantly outperforms existing methods, achieving a significant improvement of 20 and 27 percentage points respectively in macro-F1 score compared to traditional transfer learning baselines such as multilingual and cross-lingual training. We also present a detailed analysis of the transfer mechanisms and identify key factors that contribute to successful knowledge transfer in this linguistic context. Our findings provide valuable insights for developing NLP systems for other low-resource languages.
Paper Type: Long
Research Area: Multilingualism and Cross-Lingual NLP
Research Area Keywords: cross-lingual transfer,less-resourced languages,mixed language,multilingualism,multilingual evaluation
Contribution Types: Approaches to low-resource settings
Languages Studied: Marathi,Bangla,Malayalam,Hindi,English
Previous URL: https://openreview.net/forum?id=TSMJtKQ73a
Explanation Of Revisions PDF: pdf
Reassignment Request Area Chair: No, I want the same area chair from our previous submission (subject to their availability).
Reassignment Request Reviewers: No, I want the same set of reviewers from our previous submission (subject to their availability)
A1 Limitations Section: This paper has a limitations section.
A2 Potential Risks: N/A
B Use Or Create Scientific Artifacts: Yes
B1 Cite Creators Of Artifacts: Yes
B1 Elaboration: Section 3: Methodology
B2 Discuss The License For Artifacts: N/A
B3 Artifact Use Consistent With Intended Use: Yes
B3 Elaboration: Section 3: Methodology
B4 Data Contains Personally Identifying Info Or Offensive Content: N/A
B5 Documentation Of Artifacts: Yes
B5 Elaboration: Section 4: Experiments and Results
B6 Statistics For Data: Yes
B6 Elaboration: Section 4.1: Dataset
C Computational Experiments: Yes
C1 Model Size And Budget: Yes
C1 Elaboration: Section 4.2: Implementation Details
C2 Experimental Setup And Hyperparameters: Yes
C2 Elaboration: Section 4: Experiments and Results
C3 Descriptive Statistics: Yes
C3 Elaboration: Section 4: Experiments and Results
C4 Parameters For Packages: Yes
C4 Elaboration: Section 4: Experiments and Results
D Human Subjects Including Annotators: No
D1 Instructions Given To Participants: N/A
D2 Recruitment And Payment: N/A
D3 Data Consent: N/A
D4 Ethics Review Board Approval: N/A
D5 Characteristics Of Annotators: N/A
E Ai Assistants In Research Or Writing: Yes
E1 Information About Use Of Ai Assistants: Yes
E1 Elaboration: Acknowledgements
Author Submission Checklist: yes
Submission Number: 799
Loading