Boosting RL-based Multimodal Reasoning via Difficulty Prior

17 Sept 2025 (modified: 14 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Difficulty Prior, Large Reasoning Model, Multi-modal Large Language Models, Data-centric AI
Abstract: In this work, we investigate how explicitly modeling problem's *difficulty* prior information shapes the effectiveness of reinforcement learning based fine-tuning for multi-modal reasoning. Our exploration mainly comprises of following three perspective: First, through *offline* data curation, we analyze the `U-shaped` difficulty distribution of two given datasets using the base model for multi-round sampling, filtering out prompts that are either too simple or impossibly difficult to provide meaningful gradients and perform subsequent two-stage training. Second, we implement an online advantage differentiation, computing group-wise empirical accuracy as a *difficulty proxy* to adaptively reweight advantages estimation, providing stronger learning signals for more challenging problems. Finally, we introduce difficulty hints as explicit prompts for more complex samples in the second training stage, encouraging the model to calibrate its reasoning depth and perform reflective validation checks. Our comprehensive approach demonstrates significant performances across various multi-modal mathematical reasoning benchmarks with only **2K**+**0.6K** two-stage training data.
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 8273
Loading