Beyond Unidirectional Flow: LLM Reasoning with Bidirectional Cycle-Consistent CoT

Jinchang Zhu; YI DING; Chenghao Wu; Chengwei Qin; Menglin Yang

Beyond Unidirectional Flow: LLM Reasoning with Bidirectional Cycle-Consistent CoT

Jinchang Zhu, YI DING, Chenghao Wu, Chengwei Qin, Menglin Yang

20 Sept 2025 (modified: 22 Dec 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Reasoning, LLM, Chain-of-Thoughts

Abstract: Small-large model collaboration is a promising approach for efficient reasoning, where lightweight assistant models generate intermediate representations to guide larger, more capable models. However, this paradigm encounters two key challenges: \textbf{representation heterogeneity} between different model architectures and \textbf{unidirectional information flow} that prevents mutual learning. Small assistant models and large base models develop distinct geometric structures for encoding similar concepts, making direct alignment difficult and leading to information degradation. Additionally, unidirectional flow creates asymmetric dynamics where assistant models cannot benefit from large models' superior representational capacity. We introduce \textbf{CycleCoT}, a bidirectional framework that addresses these bottlenecks through cycle-consistent soft thought alignment. Our approach uses dual residual transformation networks to establish invertible mappings between heterogeneous model spaces through three mechanisms: (1) expressive mappings between different model representations, (2) bidirectional alignment objectives enforcing semantic consistency in both directions, and (3) cycle consistency constraints preserving information during round-trip transformations. This enables large models' knowledge to enhance assistant models' soft thought generation, creating symbiotic collaboration. Evaluation on LLaMA-$3.1$-$8$B-Instruct and Qwen$2.5$-$7$B-Instruct across mathematical, commonsense, and symbolic reasoning benchmarks demonstrates consistent improvements over unidirectional baselines, with gains up to $5.5\%$ on mathematical reasoning tasks. Our analysis reveals that alignment quality surpasses quantity: fewer, well-aligned soft thoughts outperform longer sequences.

Primary Area: foundation or frontier models, including LLMs

Submission Number: 25231

Loading