Boosting RL-based Multimodal Reasoning via Difficulty Prior

Mingrui Chen; Haogeng Liu; Hao Liang; Huaibo Huang; Wentao Zhang; Ran He

Boosting RL-based Multimodal Reasoning via Difficulty Prior

Mingrui Chen, Haogeng Liu, Hao Liang, Huaibo Huang, Wentao Zhang, Ran He

17 Sept 2025 (modified: 14 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Difficulty Prior, Large Reasoning Model, Multi-modal Large Language Models, Data-centric AI

Abstract: In this work, we investigate how explicitly modeling problem's *difficulty* prior information shapes the effectiveness of reinforcement learning based fine-tuning for multi-modal reasoning. Our exploration mainly comprises of following three perspective: First, through *offline* data curation, we analyze the `U-shaped` difficulty distribution of two given datasets using the base model for multi-round sampling, filtering out prompts that are either too simple or impossibly difficult to provide meaningful gradients and perform subsequent two-stage training. Second, we implement an online advantage differentiation, computing group-wise empirical accuracy as a *difficulty proxy* to adaptively reweight advantages estimation, providing stronger learning signals for more challenging problems. Finally, we introduce difficulty hints as explicit prompts for more complex samples in the second training stage, encouraging the model to calibrate its reasoning depth and perform reflective validation checks. Our comprehensive approach demonstrates significant performances across various multi-modal mathematical reasoning benchmarks with only **2K**+**0.6K** two-stage training data.

Primary Area: applications to computer vision, audio, language, and other modalities

Submission Number: 8273

Loading