variCOT: Variational Inference for Implicit Chain-of-Thought in Language Models

19 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Large Language Model, Chain-of-Thought, Variational Inference, Latent Reasoning
TL;DR: We derive an effective and efficient unified variational framework for training language models thinking in latent space
Abstract: Chain-of-Thought (CoT) elicits remarkable capabilities in large language models but is fundamentally constrained by the low-bandwidth, sequential nature of text generation. Implicit CoT methods promise to accelerate by reasoning on latent space, yet they often rely on heuristic architectures and complex multi-stage training, lacking a unified, principled foundation. We introduce VARICOT, the first principled variational framework that formulates implicit reasoning as a structured probabilistic inference problem. VARICOT learns a continuous latent variable, Z, that represents the entire reasoning process, optimized via a single, unified evidence lower bound (ELBO) objective. Our key architectural innovation, Guided Latent Reasoning, treats Z as a global reasoning context that modulates the model's computations at every layer via cross-attention. This design decouples the abstract reasoning state from the linguistic realization, enabling high-bandwidth guidance without altering the standard autoregressive generation process. Implemented within a single Transformer and trained end-to-end with strategic control tokens, VARICOT offers flexible inference: either generating answers directly for a >2.5x speedup or reproducing the full rationale when needed. On benchmarks like GSM8K and CommonsenseQA, VARICOT substantially improves upon or matches the accuracy of explicit CoT while drastically reducing latency, establishing a theoretically grounded and scalable paradigm for efficient reasoning.
Primary Area: foundation or frontier models, including LLMs
Submission Number: 19195
Loading