Deep Thinking via Recursive Self-Aggregation

Siddarth Venkatraman; Vineet Jain; Sarthak Mittal; Moksh Jain; Vedant Shah; Johan Obando-Ceron; Yoshua Bengio; Brian R. Bartoldson; Bhavya Kailkhura; Guillaume Lajoie; Glen Berseth; Nikolay Malkin

Deep Thinking via Recursive Self-Aggregation

Siddarth Venkatraman, Vineet Jain, Sarthak Mittal, Moksh Jain, Vedant Shah, Johan Obando-Ceron, Yoshua Bengio, Brian R. Bartoldson, Bhavya Kailkhura, Guillaume Lajoie, Glen Berseth, Nikolay Malkin

Published: 23 Sept 2025, Last Modified: 07 Dec 2025FoRLM 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: LLM, reasoning, RL

TL;DR: We introduce Recursive Self-Aggregation, a test-time scaling method inspired by genetic algorithms, where an LLM iteratively aggregates its own reasoning traces to refine and improve solutions.

Abstract: Large language models (LLMs) exhibit strong reasoning capabilities through chain-of-thought (CoT) prompting, but their outputs remain unreliable due to high variability across reasoning trajectories. Parallel scaling methods like majority voting improve accuracy, but cannot "think deeper". On the other hand, sequential refinement risks locking the model into an incorrect reasoning path, from which it cannot escape. In this work, we show that LLMs can serve as aggregators over multiple CoTs, cross-referencing trajectories to identify errors and synthesize higher-quality responses. We propose Recursive Self-Aggregation (RSA), an evolutionary framework for deep thinking with increased test-time compute: aggregated CoTs are reintroduced as candidate proposals in subsequent rounds, allowing the model to progressively refine answers through iterative reasoning. This recursive aggregation, a hybrid-scaling strategy, yields monotonically improving performance with increasing token budgets. We also demonstrate that reinforcement learning (RL) finetuning can be made aggregation-aware, yielding policies that achieve superior inference-time performance under recursive aggregation compared to those trained solely for direct solution generation. On math reasoning tasks and countdown, RSA significantly outperforms baseline approaches including purely parallel and sequential strategies, with RL-trained aggregation providing additional gains.

Submission Number: 183

Loading