Language-Aware Token Boosting: LLM Language Confusion Reduction Without Tuning

Language-Aware Token Boosting: LLM Language Confusion Reduction Without Tuning

ACL ARR 2026 January Submission3307 Authors

04 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: code-switching, mixed language, multilingualism, language change

Abstract: Large language models (LLMs) sometimes exhibit language confusion when generating non-English text. Existing approaches typically rely on fine-tuning to mitigate this issue. In contrast, we propose a tuning-free paradigm for reducing language confusion. Within this paradigm, we introduce two methods: Language-Aware Token Boosting (LATB), which applies targeted perturbations to tokens associated with the desired language, and Adaptive Language-Aware Token Boosting (Adaptive-LATB), which dynamically adjusts these perturbations based on the model’s confidence in the intended language. Experiments demonstrate that our methods effectively improve multilingual alignment by reducing language confusion, while maintain the summarization quality without requiring any additional fine-tuning. Our code is publicly available.\footnote{\url{https://anonymous.4open.science/r/Language-Aware-Token-Boosting-Anonymous-7181}}.

Paper Type: Short

Research Area: Multilinguality and Language Diversity

Research Area Keywords: Multilingualism and Cross-Lingual NLP

Contribution Types: NLP engineering experiment, Approaches to low-resource settings, Approaches low compute settings-efficiency

Languages Studied: Russian, Chinese Simplified, Japanese, French, Korean, Thai, Hindi, Arabic

Submission Number: 3307

Loading