Continuous Chain of Thought Enables Parallel Exploration and Reasoning

Halil Alperen Gozeten; Muhammed Emrullah Ildiz; Xuechen Zhang; Hrayr Harutyunyan; Ankit Singh Rawat; Samet Oymak

Continuous Chain of Thought Enables Parallel Exploration and Reasoning

Halil Alperen Gozeten, Muhammed Emrullah Ildiz, Xuechen Zhang, Hrayr Harutyunyan, Ankit Singh Rawat, Samet Oymak

Published: 10 Jun 2025, Last Modified: 15 Jul 2025MOSS@ICML2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: chain-of-thought, latent space reasoning, parallel exploration, transformers, policy optimization, multi token sampling

TL;DR: This paper establishes theoretical benefits of chain-of-thought with continuous tokens (CoT2) and introduces new supervision and policy optimization strategies to train CoT2 models.

Abstract: We propose CoT2, a framework using continuously-valued tokens that enables language models to track multiple reasoning paths in parallel and provide a novel CoT2 supervision strategy where we match the softmax outputs to the empirical token distributions of a set of target traces. Theoretically, we show that CoT2 offers sample-complexity benefits and construct a one-layer transformer that solves the subset-sum problem with sufficient embedding capacity. We also introduce continuous sampling methods, showing that reinforcement learning with CoT2 notably improves logical reasoning performance compared to discrete and continuous baselines.

Code: ipynb

Submission Number: 65

Loading