Keywords: Reinforcement Learning, Multimodal, Reasoning, GRPO, Curriculum Learning
TL;DR: We propose VL-COGITO, a multimodal reasoning model trained with Progressive Curriculum RL that adaptively controls task difficulty and reasoning length, achieving SOTA or competitive performance across diverse reasoning benchmarks.
Abstract: Reinforcement learning has shown strong potential in improving the reasoning abilities of large language models, and recent studies extend this paradigm to multimodal reasoning. However, the complexity and diversity of multimodal tasks often lead to unstable performance across domains and difficulty levels. To address these challenges, we introduce VL-COGITO, a multimodal reasoning model trained with a multi-stage Progressive Curriculum Reinforcement Learning (PCuRL) framework. PCuRL gradually increases task difficulty, enhancing reasoning robustness in diverse contexts. It features two key innovations: (1) an online difficulty–aware weighting mechanism that dynamically adjusts task difficulty across training stages, and (2) a dynamic length reward that encourages adaptive control of reasoning path length to balance efficiency and accuracy. Experiments demonstrate that VL-COGITO achieves state-of-the-art performance on 8 out of 10 benchmark tasks spanning mathematics, science, logic, and general understanding, while matching comparable results on the remaining 2 tasks, validating the effectiveness of our approach.
Supplementary Material: zip
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 8644
Loading