MLCL: A Framework for Reducing Language Imbalance in Sino-Tibetan Languages through Adapter Structures

Published: 05 Sept 2024, Last Modified: 16 Oct 2024ACML 2024 Conference TrackEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Text Classification, Language Imbalance, Multilingual Pre-trained Model, Contrastive Learning, Comparative Linguistics, Sino-Tibetan Language Family
Verify Author List: I have double-checked the author list and understand that additions and removals will not be allowed after the submission deadline.
Abstract: Multilingual pre-trained models have been widely applied in natural language processing (NLP) tasks, including text classification. However, due to the varying amounts of language resources, these models exhibit performance imbalance across different languages, a phenomenon known as language imbalance. Existing research on mitigating language imbalance primarily harnesses text and image data, neglecting the auditory aspects of languages. This neglect results in an incomplete solution to language imbalance, as it fails to exploit the rich linguistic nuances conveyed through speech. To address these issues, this paper introduces a novel framework called MultiLingual Contrastive Learning (MLCL) to reduce language imbalance. By incorporating concepts from comparative linguistics into neural networks, we utilize the phonetic similarities among languages within the Sino-Tibetan family to tackle the problem of language imbalance in multilingual pre-trained models. To evaluate our method's effectiveness, we conducted tests using two synthetic datasets derived from the Flores200 and mms datasets across various models. The experimental results show that, in terms of language imbalance metrics, our model surpasses all baseline models.
A Signed Permission To Publish Form In Pdf: pdf
Primary Area: Trustworthy Machine Learning (accountability, explainability, transparency, causality, fairness, privacy, robustness, autoML, etc.)
Paper Checklist Guidelines: I certify that all co-authors of this work have read and commit to adhering to the guidelines in Call for Papers.
Student Author: Yes
Submission Number: 130
Loading