Keywords: Large Language Model, Chain-of-Thought, Variational Inference, Latent Reasoning
TL;DR: We derive an effective and efficient unified variational framework for training language models thinking in latent space
Abstract: Chain-of-Thought (CoT) reasoning dramatically improves language model performance but incurs significant computational overhead through sequential token generation. While implicit CoT methods promise efficiency by operating in latent space, they largely rely on heuristic architectures, complex multi-stage training (e.g., distillation), and lack a principled objective for end-to-end optimization. We introduce variCoT, a principled variational framework that overcomes these limitations through a unified evidence lower bound (ELBO) objective. Implemented in a single Transformer with strategic control tokens, variCoT learns a continuous latent reasoning trace $Z$ and deploys it via \textit{guided latent reasoning}: $Z$ acts as a cross-attention query to guide generation across all layers, decoupling abstract reasoning from linguistic realization. This enables flexible inference—direct answer generation (2.5$\times$ faster) or optional full CoT reproduction—without architectural fragmentation. On GSM8K and CommonsenseQA, variCoT matches or exceeds explicit CoT accuracy while significantly reducing latency, establishing a theoretically grounded and scalable approach to efficient reasoning.
Primary Area: foundation or frontier models, including LLMs
Submission Number: 19195
Loading