Training Large Language Models To Reason In Parallel With Global Forking Tokens

Training Large Language Models To Reason In Parallel With Global Forking Tokens

ICLR 2026 Conference Submission19784 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: large language model, reasoning, chain of thoughts

TL;DR: We treat parallel reasoning as a set prediction problem and incorporate a set-based global loss into SFT using bipartite matching between global forking tokens and diverse reasoning traces.

Abstract: Although LLMs have demonstrated improved performance by scaling parallel test-time compute, doing so relies on generating reasoning paths that are both diverse and accurate. For challenging problems, the forking tokens that trigger diverse yet correct reasoning modes are typically deep in the sampling tree. Consequently, common strategies to encourage diversity, such as temperature scaling, encounter a worsened trade-off between diversity and accuracy. Motivated by this challenge, we treat parallel reasoning as a set-of-next-token-prediction problem and incorporate a set-based global loss into Supervised Fine-Turning (SFT) using bipartite matching between global forking tokens and unique reasoning traces. We observe that, whereas naive fine-tuning with multiple reasoning traces collapses these unique reasoning modes, our proposed method, Set Supervised Fine-Turning (SSFT), preserves these modes and produces emergent global forking tokens. Experiments on multiple reasoning benchmarks show our SSFT method consistently outperforms SFT under both pass@1 and cons@k metrics.

Supplementary Material: zip

Primary Area: foundation or frontier models, including LLMs

Submission Number: 19784

Loading