BhashaSetu: Cross-Lingual Knowledge Transfer from High-Resource to Low-Resource Language

BhashaSetu: Cross-Lingual Knowledge Transfer from High-Resource to Low-Resource Language

ACL ARR 2025 July Submission799 Authors

28 Jul 2025 (modified: 20 Aug 2025)ACL ARR 2025 July SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Despite remarkable advances in natural language processing, developing effective systems for low-resource languages remains a formidable challenge, with performance typically lagging far behind high-resource counterparts due to data scarcity and insufficient linguistic resources. Cross-lingual knowledge transfer has emerged as a promising approach to address this challenge by leveraging resources from high-resource languages. In this paper, we investigate methods for transferring linguistic knowledge from high-resource languages to low-resource languages, where the number of labeled training instances is in hundreds. We focus on sentence-level and word-level tasks. We examine three approaches for cross-lingual knowledge transfer: (a) augmentation in hidden layers, (b) token embedding transfer through token translation, and (c) a novel method for sharing token embeddings at hidden layers using Graph Neural Networks. Experimental results on sentiment classification and NER tasks on low-resource languages Marathi, Bangla (Bengali) and Malayalam using high-resource languages Hindi and English demonstrate that our novel GNN-based approach significantly outperforms existing methods, achieving a significant improvement of 20 and 27 percentage points respectively in macro-F1 score compared to traditional transfer learning baselines such as multilingual and cross-lingual training. We also present a detailed analysis of the transfer mechanisms and identify key factors that contribute to successful knowledge transfer in this linguistic context. Our findings provide valuable insights for developing NLP systems for other low-resource languages.

Paper Type: Long

Research Area: Multilingualism and Cross-Lingual NLP

Research Area Keywords: cross-lingual transfer,less-resourced languages,mixed language,multilingualism,multilingual evaluation

Contribution Types: Approaches to low-resource settings

Languages Studied: Marathi,Bangla,Malayalam,Hindi,English

Previous URL: https://openreview.net/forum?id=TSMJtKQ73a

Explanation Of Revisions PDF: pdf

Reassignment Request Area Chair: No, I want the same area chair from our previous submission (subject to their availability).

Reassignment Request Reviewers: No, I want the same set of reviewers from our previous submission (subject to their availability)

A1 Limitations Section: This paper has a limitations section.

A2 Potential Risks: N/A

B Use Or Create Scientific Artifacts: Yes

B1 Cite Creators Of Artifacts: Yes

B1 Elaboration: Section 3: Methodology

B2 Discuss The License For Artifacts: N/A

B3 Artifact Use Consistent With Intended Use: Yes

B3 Elaboration: Section 3: Methodology

B4 Data Contains Personally Identifying Info Or Offensive Content: N/A

B5 Documentation Of Artifacts: Yes

B5 Elaboration: Section 4: Experiments and Results

B6 Statistics For Data: Yes

B6 Elaboration: Section 4.1: Dataset

C Computational Experiments: Yes

C1 Model Size And Budget: Yes

C1 Elaboration: Section 4.2: Implementation Details

C2 Experimental Setup And Hyperparameters: Yes

C2 Elaboration: Section 4: Experiments and Results

C3 Descriptive Statistics: Yes

C3 Elaboration: Section 4: Experiments and Results

C4 Parameters For Packages: Yes

C4 Elaboration: Section 4: Experiments and Results

D Human Subjects Including Annotators: No

D1 Instructions Given To Participants: N/A

D2 Recruitment And Payment: N/A

D3 Data Consent: N/A

D4 Ethics Review Board Approval: N/A

D5 Characteristics Of Annotators: N/A

E Ai Assistants In Research Or Writing: Yes

E1 Information About Use Of Ai Assistants: Yes

E1 Elaboration: Acknowledgements

Author Submission Checklist: yes

Submission Number: 799

Loading