Abstract: Chain-of-Thought (CoT) distillation transfers the reasoning capabilities from large to small language models through supervised training. In general, smaller models are trained to first generate rationales-step-by-step guidance distilled from larger models-then produce answers jointly conditioned on these rationales and original questions. However, we identify two critical limitations: (1) Error accumulation: CoT’s sequential execution propagates early-stage mistakes to downstream steps through causal dependencies; (2) Non-causal attribution: Models often conflate incidental information with logically necessary conditions. To address these issues, we propose Loop-of-Thought (LoT), a cyclic verification architecture that implements reflective cognition. The framework features: (1) Forward reasoning: Cascaded forward inference from questions to answers via learned rationales; (2) Backward validation: Answer-guided backward tracing that identifies critical reasoning anchors. By enforcing cycle-consistent constraints between these dual processes, LoT achieves self-correcting closed-loop reasoning. Additionally, we introduce Rationale Purification, an auxiliary task that prunes redundant reasoning steps while preserving valid logic pathways. Experiments across multiple benchmark datasets demonstrate that our LoT framework improves complex reasoning performance compared to existing CoT distillation methods.
External IDs:dblp:conf/nlpcc/JiSJQZY25
Loading