Learning Generalizable Thinking Composition for Multimodal Reasoning

ACL ARR 2026 January Submission9308 Authors

06 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Multimodal Large Language Models, Structured Reasoning, Generalization
Abstract: Recent advancements in multimodal reasoning have raised interest in improving interpretability and generalization, motivating structured reasoning as a promising approach. While prior studies have explored structured reasoning through various problem-specific designs, they often lack a unified perspective on how reasoning structures can be composed across tasks. In this work, we take a step toward addressing this by modeling reasoning as the dynamic composition of cognitive-inspired reasoning modules, enabling the system to construct flexible and generalizable reasoning chains. Specifically, we introduce a two-stage \textbf{compositional reasoning framework}: we first warm up the chain composer with high-quality module chains generated by feedback-guided greedy search, and then apply PPO-based structure policy optimization to learn a more generalizable composition policy. Our framework yields strong cross-task generalization. When trained on a single math-centric dataset, the composer consistently improves performance across diverse math benchmarks and further extends to general-domain multimodal tasks, indicating that it learns cross-task composition rules rather than dataset-specific heuristics. On the challenging MMMU-Pro benchmark, our model, based on Qwen2.5-VL-7B, achieves a 7.11\% gain over its backbone, approaching the performance of the much larger Qwen2.5-VL-72B. These results support our central claim that reasoning-chain composition follows patterns that are consistent across tasks, which in turn enables the composer to learn reusable module-composition rules for multimodal reasoning.
Paper Type: Long
Research Area: Question Answering
Research Area Keywords: Mathematical, Symbolic, Neurosymbolic, and Logical Reasoning, Question Answering
Contribution Types: NLP engineering experiment
Languages Studied: English
Submission Number: 9308
Loading