Not All Tokens Need to Be Said: Selective Latent Execution for Efficient Chain-of-Thought Reasoning

Published: 28 Apr 2026, Last Modified: 28 Apr 2026MSLD 2026 PosterEveryoneRevisionsCC BY 4.0
Keywords: Chain of Thought, Latent Reasoning, Test Time Compute, Hierarchical Reasoning, Reasoning Efficiency
TL;DR: SLE improves reasoning efficiency by keeping high level planning explicit while compressing low level execution into adaptive latent loops, reducing unnecessary token generation without sacrificing accuracy or interpretability.
Abstract: Chain of thought reasoning improves language model performance on complex tasks, but current models treat all reasoning tokens uniformly, mixing high level planning with low level execution even though these steps contribute differently to the final answer. Prior work suggests that many execution tokens are expendable: they often have low counterfactual importance, longer traces can hurt accuracy, and RL training benefits mainly from a small set of high entropy tokens. We propose Selective Latent Execution (SLE), an architecture that preserves high entropy planning steps as explicit chain of thought while compressing low entropy execution steps into looped hidden state iterations with adaptive depth controlled by a learned hazard based exit gate. SLE is trained through supervised warmup, progressive execution masking, and GRPO with entropy weighted credit assignment and an execution specific length penalty. We evaluate SLE on GSM8K, MATH 500, and AIME using Ouro 1.4B, measuring accuracy, token count, inference compute, latency, and accuracy per compute, with additional probes of latent intermediate representations.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 170
Loading