Towards Confident Multilingual Generation from English-Centric LLMs: A Tuning-Free Approach

ACL ARR 2025 February Submission402 Authors

07 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: This paper introduces a new taxonomy of multilingual alignment for English-centric language models through token perturbation techniques. We propose two methods within this paradigm: the Language-Aware Token Boosting (LATB), which directly adds perturbations to desired language tokens, and its adaptive variant, the Adaptive Language-Aware Token Boosting (Adaptive-LATB), which dynamically adjusts perturbations based on the model's confidence in the intended language. Extensive experiments show that our methods effectively enhance multilingual alignment. Compared to the fine-tuning method, our approaches achieve superior results in reducing language confusion and improving summarization quality without requiring additional fine-tuning. Our code will be publicly available soon.
Paper Type: Short
Research Area: Multilingualism and Cross-Lingual NLP
Research Area Keywords: code-switching; mixed language; multilingualism; language change
Contribution Types: NLP engineering experiment, Approaches to low-resource settings, Approaches low compute settings-efficiency, Publicly available software and/or pre-trained models
Languages Studied: Russian, Chinese Simplified, Japanese, French, Korean, Thai, Hindi, Arabic
Submission Number: 402
Loading