[Proposal-ML] Cross-Cultural Language Adaptation: Fine-Tuning Gemma 2 for Diverse Linguistic Contexts
Keywords: Multiligual Model, Large Language Model, Knowledge Transfer, Multilingual Neural Machine Translation
Abstract: This paper introduces a novel approach for enhancing translation quality between low-resource languages, specifically focusing on Chinese and Malay in scientific domains. Leveraging Google’s GEMMA-2-9B model, we utilize a pivot-based Multilingual Neural Machine Translation (MNMT) strategy with English as an intermediary language. Our methodology involves a multi-step process of fine-tuning, using Supervised Fine-Tuning (SFT) and Low-Rank Adaptation (LoRA), to improve translation efficiency and accuracy. This approach includes dataset curation from biology-specific scientific articles, supplemented with synthetic data to strengthen the model's handling of domain-specific terminology and complex linguistic structures. To assess model performance, we employ both automated and human evaluation metrics, including COMET, BLEU, BERT-F1, and CHRF, targeting high-quality results. Our findings demonstrate the potential of pivot-based MNMT methods in bridging low-resource language gaps in scientific knowledge, presenting a scalable solution that could expand to other languages and domains, fostering inclusivity in multilingual communication.
Submission Number: 47
Loading