Reinforcement Learning with Semantic Rewards Enables Low-Resource Language Expansion without Alignment Tax

Reinforcement Learning with Semantic Rewards Enables Low-Resource Language Expansion without Alignment Tax

ACL ARR 2026 January Submission4565 Authors

05 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: low-resource language, alignment tax, reinforcement learning, semantic rewards, semantic-space alignment, language expansion

Abstract: Extending large language models (LLMs) to low-resource languages often incurs an “align- ment tax”: improvements in the target lan- guage come at the cost of catastrophic forget- ting in general capabilities. We argue that this trade-off arises from the rigidity of supervised fine-tuning (SFT), which enforces token-level surface imitation on narrow and biased data distributions. To address this limitation, we propose a semantic-space alignment paradigm powered by Group Relative Policy Optimiza- tion (GRPO), where the model is optimized us- ing embedding-level semantic rewards rather than likelihood maximization. This objective encourages meaning preservation through flex- ible realizations, enabling controlled updates that reduce destructive interference with pre- trained knowledge. We evaluate our approach on Tibetan–Chinese machine translation and Ti- betan headline generation. Experiments show that our method acquires low-resource capa- bilities while markedly mitigating alignment tax, preserving general competence more effec- tively than SFT. Despite producing less rigid surface overlap, semantic RL yields higher se- mantic quality and preference in open-ended generation, and few-shot transfer results indi- cate that it learns more transferable and ro- bust representations under limited supervision. Overall, our study demonstrates that reinforce- ment learning with semantic rewards provides a safer and more reliable pathway for inclusive low-resource language expansion.

Paper Type: Long

Research Area: Low-resource Methods for NLP

Research Area Keywords: Efficient/Low-Resource Methods for NLP，Multilingualism and Cross-Lingual NLP

Contribution Types: Reproduction study, Approaches to low-resource settings

Languages Studied: Tibetan,Chinese

Submission Number: 4565

Loading