mR3: Multilingual Rubric-Agnostic Reward Reasoning Models

Published: 26 Jan 2026, Last Modified: 11 Feb 2026ICLR 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: reward model, reasoning, rubric
TL;DR: We introduce mR3, a massively multilingual, rubric-agnostic reward reasoning model trained on 72 languages.
Abstract: Evaluation using Large Language Model (LLM) judges has been widely adopted in English and shown to be effective for automatic evaluation. However, their performance does not generalize well to non-English settings, and it remains unclear what constitutes effective multilingual training for such judges. In this paper, we introduce mR3, a massively multilingual, rubric-agnostic reward reasoning model trained on 72 languages, achieving the broadest language coverage in reward modeling to date. We present a comprehensive study of data and curriculum selection for training to identify effective strategies and data sources for building high-quality reward models, including support for reasoning in the target language. Our approach attains state-of-the-art performance on multilingual reward model benchmarks, surpassing much larger models (i.e., GPT-OSS-120B) while being up to 9x smaller, and its effectiveness is further confirmed through extensive ablation studies. Finally, we demonstrate the effectiveness of mR3 in off-policy preference optimization and validate the quality of its reasoning traces and rubric-based evaluations through human studies with 20 annotators across 12 languages, where mR3 models' reasoning is preferred, including for extremely low-resource languages that are entirely unseen during training. Our models, data, and code are available as open source at https://github.com/rubricreward/mr3.
Supplementary Material: zip
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 14825
Loading