Mix Data or Merge Models? Optimizing for Performance and Safety in Multilingual Contexts

ACL ARR 2025 May Submission4344 Authors

19 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Large Language Models (LLMs) are increasingly used worldwide for diverse applications. However, ensuring their safe use continues to be a complex challenge. To tackle this, safety is often embedded into models as a "behavior" and is frequently overfit to harms prevalent in Western-centric datasets. In this work, we aim to address this by systematically exploring the potential of model merging in this diverse multi-task setting --- considering safety in LLMs as a "task" and combining models trained for safety-specific tasks with those for more general-purpose tasks, all within a multilingual context. We categorize our experiments into two primary groups: objective-based and language-based, according to the fine-tuning objective of the models being merged. Our results demonstrate that objective-based merging is significantly more effective than data mixing, yielding improvements of up to 8\% in general performance and 10\% in safety. We also find that language-based merging is highly effective --- by merging monolingual models, we achieve a 4\% increase in general performance and 7\% reduction in harm across all languages over the data mixing approach. Overall, our comprehensive study of model merging in the context of multilingual safety provides a useful framework for building strong and safe multilingual models without the need for retraining them.
Paper Type: Long
Research Area: Language Modeling
Research Area Keywords: Model merging, Alignment, Multilingual, Safety
Contribution Types: Approaches low compute settings-efficiency
Languages Studied: English, French, Spanish, Hindi, Arabic, Russian
Submission Number: 4344
Loading