Rhombus: Incentivizing Coordination in Parallel Thinking through Reinforcement Learning

ACL ARR 2026 January Submission5960 Authors

05 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Large Language Models, Reinforcement Learning, Parallel Thinking, Test-Time Scaling
Abstract: Parallel thinking offers a promising avenue for scaling test-time compute in Large Language Models (LLMs), enabling them to explore diverse solution paths simultaneously before aggregating them into a final answer. However, coordinating the exploration and aggregation stages remains challenging, as simple aggregation techniques often incur information loss, failing to preserve the subtle, decision-relevant signals generated during exploration. To overcome this, we propose Rhombus, a parallel thinking framework that explicitly incentivizes coordination between components via end-to-end reinforcement learning. Rhombus employs multiple parallel Proposers to generate compact, decision-focused reasoning cues and a central Synthesizer to integrate them into final predictions, utilizing co-training under a shared task reward to align their interaction. Across challenging mathematical reasoning benchmarks, Rhombus improves accuracy by 6.0\% over long chain-of-thought baselines while reducing wall-clock latency by 39.4\% under matched token budgets. Our work demonstrates that explicit communication optimization is essential for realizing the accuracy and efficiency gains of parallel reasoning.
Paper Type: Long
Research Area: Language Models
Research Area Keywords: Language Modeling, Generation
Contribution Types: NLP engineering experiment
Languages Studied: English
Submission Number: 5960
Loading