Instilling Parallel Reasoning into Language Models

Matthew Macfarlane; Minseon Kim; Nebojsa Jojic; Weijia Xu; Lucas Caccia; Xingdi Yuan; Wanru Zhao; Zhengyan Shi; Alessandro Sordoni

Instilling Parallel Reasoning into Language Models

Matthew Macfarlane, Minseon Kim, Nebojsa Jojic, Weijia Xu, Lucas Caccia, Xingdi Yuan, Wanru Zhao, Zhengyan Shi, Alessandro Sordoni

Published: 09 Jul 2025, Last Modified: 16 Jul 2025AI4Math@ICML25 PosterEveryoneRevisionsBibTeXCC BY-NC-SA 4.0

Keywords: Reasoning, Language Models, Parallel Reasoning, Chain of Thought

Abstract: Sequential chain-of-thought reasoning significantly improves the performance of Large language models (LLMs) on complex tasks. However, sequential reasoning has structural limitations: Long chains are expensive due to attention's quadratic complexity, and multiple diverse strategies cannot be considered simultaneously. To address this we propose a method that instills parallel reasoning capabilities in LLMs by distilling parallel reasoning traces from a teacher model. This approach enables models to decompose problems, explore diverse strategies via concurrent reasoning traces, and aggregate trace outputs for the final answer. Evaluating on a variety of math and puzzle benchmarks such as MATH 500, AIME and Countdown, we show our approach can decompose parallelizable problems, and that the performance scales with the number of parallel traces. The resulting model can dynamically allocate reasoning strategies based on problem complexity, outperforming standard sampling methods.

Submission Number: 136

Loading