AutoRubric-R1V: Rubric-Based Generative Rewards for Faithful Multimodal Reasoning

Mengzhao Jia; Zhihan Zhang; Ignacio Cases; Zheyuan Liu; Meng Jiang; Peng Qi

AutoRubric-R1V: Rubric-Based Generative Rewards for Faithful Multimodal Reasoning

Mengzhao Jia, Zhihan Zhang, Ignacio Cases, Zheyuan Liu, Meng Jiang, Peng Qi

19 Sept 2025 (modified: 23 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: multimodal reasoning, reinforcement learning, math reasoning

Abstract: Multimodal large language models (MLLMs) have rapidly advanced from perception tasks to complex multi-step reasoning, yet reinforcement learning with verifiable rewards (RLVR) often leads to spurious reasoning since only the final-answer correctness is rewarded. To address this limitation, we propose **AutoRubric-R1V**, a framework that integrates RLVR with process-level supervision through automatically collected rubric-based generative rewards. Our key innovation lies in a scalable self-aggregation method that distills consistent reasoning checkpoints from successful trajectories, enabling problem-specific rubric construction without human annotation or stronger teacher models. By jointly leveraging rubric-based and outcome rewards, AutoRubric-R1V achieves state-of-the-art performance on six multimodal reasoning benchmarks and substantially improves reasoning faithfulness in dedicated evaluations.

Primary Area: applications to computer vision, audio, language, and other modalities

Submission Number: 21985

Loading