variCOT: Variational Inference for Implicit Chain-of-Thought in Language Models

variCOT: Variational Inference for Implicit Chain-of-Thought in Language Models

ICLR 2026 Conference Submission19195 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Large Language Model, Chain-of-Thought, Variational Inference, Latent Reasoning

TL;DR: We derive an effective and efficient unified variational framework for training language models thinking in latent space

Abstract: Chain-of-Thought (CoT) reasoning dramatically improves language model performance but incurs significant computational overhead through sequential token generation. While implicit CoT methods promise efficiency by operating in latent space, they largely rely on heuristic architectures, complex multi-stage training (e.g., distillation), and lack a principled objective for end-to-end optimization. We introduce variCoT, a principled variational framework that overcomes these limitations through a unified evidence lower bound (ELBO) objective. Implemented in a single Transformer with strategic control tokens, variCoT learns a continuous latent reasoning trace $Z$ and deploys it via \textit{guided latent reasoning}: $Z$ acts as a cross-attention query to guide generation across all layers, decoupling abstract reasoning from linguistic realization. This enables flexible inference—direct answer generation (2.5$\times$ faster) or optional full CoT reproduction—without architectural fragmentation. On GSM8K and CommonsenseQA, variCoT matches or exceeds explicit CoT accuracy while significantly reducing latency, establishing a theoretically grounded and scalable approach to efficient reasoning.

Primary Area: foundation or frontier models, including LLMs

Submission Number: 19195

Loading