Mix Data or Merge Models? Optimizing for Performance and Safety in Multilingual Contexts

Mix Data or Merge Models? Optimizing for Performance and Safety in Multilingual Contexts

ACL ARR 2025 May Submission4344 Authors

19 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Large Language Models (LLMs) are increasingly used worldwide for diverse applications. However, ensuring their safe use continues to be a complex challenge. To tackle this, safety is often embedded into models as a "behavior" and is frequently overfit to harms prevalent in Western-centric datasets. In this work, we aim to address this by systematically exploring the potential of model merging in this diverse multi-task setting --- considering safety in LLMs as a "task" and combining models trained for safety-specific tasks with those for more general-purpose tasks, all within a multilingual context. We categorize our experiments into two primary groups: objective-based and language-based, according to the fine-tuning objective of the models being merged. Our results demonstrate that objective-based merging is significantly more effective than data mixing, yielding improvements of up to 8\% in general performance and 10\% in safety. We also find that language-based merging is highly effective --- by merging monolingual models, we achieve a 4\% increase in general performance and 7\% reduction in harm across all languages over the data mixing approach. Overall, our comprehensive study of model merging in the context of multilingual safety provides a useful framework for building strong and safe multilingual models without the need for retraining them.

Paper Type: Long

Research Area: Language Modeling

Research Area Keywords: Model merging, Alignment, Multilingual, Safety

Contribution Types: Approaches low compute settings-efficiency

Languages Studied: English, French, Spanish, Hindi, Arabic, Russian

Submission Number: 4344

Loading