mR3: Multilingual Rubric-Agnostic Reward Reasoning Models

mR3: Multilingual Rubric-Agnostic Reward Reasoning Models

ICLR 2026 Conference Submission14825 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: reward model, reasoning, rubric

TL;DR: We introduce mR3, a massively multilingual, rubric-agnostic reward reasoning model trained on 72 languages.

Abstract: Evaluation using Large Language Model (LLM) judges has been widely adopted in English and shown to be effective for automatic evaluation. However, their performance does not generalize well to non-English settings, and it remains unclear what constitutes effective multilingual training for such judges. In this paper, we introduce mR3, a massively multilingual, rubric-agnostic reward reasoning model trained on 72 languages, achieving the broadest language coverage in reward modeling to date. We present a comprehensive study of data and curriculum selection for training to identify effective strategies and data sources for building high-quality reward models, including the integration of target-language reasoning datasets. Our approach attains state-of-the-art performance on multilingual reward model benchmarks, surpassing much larger models (i.e., GPT-OSS-120B) while being up to nine times smaller, and its effectiveness is further confirmed through extensive ablation studies. We will release our models and datasets publicly upon acceptance.

Supplementary Material: zip

Primary Area: applications to computer vision, audio, language, and other modalities

Submission Number: 14825

Loading