Towards Confident Multilingual Generation from English-Centric LLMs: A Tuning-Free Approach

Towards Confident Multilingual Generation from English-Centric LLMs: A Tuning-Free Approach

ACL ARR 2025 May Submission1681 Authors

18 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: This paper introduces a new taxonomy of multilingual alignment for English-centric language models through token perturbation techniques. We propose two methods within this paradigm: the Language-Aware Token Boosting (LATB), which directly adds perturbations to desired language tokens, and its adaptive variant, the Adaptive Language-Aware Token Boosting (Adaptive-LATB), which dynamically adjusts perturbations based on the model's confidence in the intended language. Extensive experiments show that our methods effectively enhance multilingual alignment by reducing language confusion and marginally improving summarization quality without requiring additional fine-tuning. Our code is publicly available \footnote{\url{https://anonymous.4open.science/r/Language-Aware-Token-Boosting-Anonymous-7181}}.

Paper Type: Short

Research Area: Multilingualism and Cross-Lingual NLP

Research Area Keywords: code-switching, mixed language, multilingualism, language change

Contribution Types: NLP engineering experiment, Approaches to low-resource settings, Approaches low compute settings-efficiency, Publicly available software and/or pre-trained models

Languages Studied: Russian, Chinese Simplified, Japanese, French, Korean, Thai, Hindi, Arabic

Submission Number: 1681

Loading