Abstract: This paper introduces a new taxonomy of multilingual alignment for English-centric language models through token perturbation techniques. We propose two methods within this paradigm: the Language-Aware Token Boosting (LATB), which directly adds perturbations to desired language tokens, and its adaptive variant, the Adaptive Language-Aware Token Boosting (Adaptive-LATB), which dynamically adjusts perturbations based on the model's confidence in the intended language. Extensive experiments show that our methods effectively enhance multilingual alignment by reducing language confusion and marginally improving summarization quality without requiring additional fine-tuning.
Our code is publicly available \footnote{\url{https://anonymous.4open.science/r/Language-Aware-Token-Boosting-Anonymous-7181}}.
Paper Type: Short
Research Area: Multilingualism and Cross-Lingual NLP
Research Area Keywords: code-switching, mixed language, multilingualism, language change
Contribution Types: NLP engineering experiment, Approaches to low-resource settings, Approaches low compute settings-efficiency, Publicly available software and/or pre-trained models
Languages Studied: Russian, Chinese Simplified, Japanese, French, Korean, Thai, Hindi, Arabic
Submission Number: 1681
Loading