SLAT: Segment-Level Adaptive Trimming for Efficient CoT Reasoning

Jian Yao; Xiongcai Luo; Ran Cheng; KC Tan

SLAT: Segment-Level Adaptive Trimming for Efficient CoT Reasoning

Jian Yao, Xiongcai Luo, Ran Cheng, KC Tan

Published: 30 Apr 2026, Last Modified: 24 Jun 2026ICML 2026 regularEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Recent advances in Large Reasoning Models have significantly improved chain-of-thought (CoT) capabilities via reinforcement learning (RL). However, generated reasoning chains frequently suffer from structural redundancy (i.e., \emph{overthinking}), incurring high computational overhead without improving answer correctness. Existing mitigation strategies typically rely on token-uniform length penalties, which provide coarse, segment-agnostic pressure toward shorter outputs and can inadvertently suppress useful reasoning alongside redundancy. To address this, we demonstrate that inefficiency concentrates in high-probability segments with low marginal utility. We derive a theoretical characterization of segment suboptimality under the correctness-length trade-off objective and propose \textsc{SLAT} (Segment-Level Adaptive Trimming), an RL framework that selectively suppresses redundant segments based on this criterion. Empirical results on standard benchmarks indicate that \textsc{SLAT} establishes a superior accuracy-efficiency Pareto frontier, reducing reasoning length by 50\% relative to uncompressed baselines while maintaining competitive accuracy. Overall, our results suggest that theoretically grounded, segment-aware trimming is a promising direction for efficient CoT reasoning in large language models.

Lay Summary: Large language models often solve difficult problems by generating long chains of thought, but many parts of these reasoning traces can be repetitive and add substantial computational cost without improving the final answer. This paper introduces \textsc{SLAT}, a training method that encourages models to produce shorter and more efficient reasoning. Instead of simply penalizing all long outputs, \textsc{SLAT} targets reasoning segments that are both highly predictable under the model and persist over a nontrivial span, which are more likely to be redundant. The method is applied during reinforcement learning and is gated by correctness, so the model is encouraged to shorten successful reasoning while preserving answer quality. Experiments on mathematical reasoning benchmarks show that \textsc{SLAT} can substantially reduce reasoning length with little or no loss in accuracy. These results suggest that reasoning models can be made more efficient without sacrificing their problem-solving ability.

Originally Submitted Supplementary Material: zip

Primary Area: Deep Learning->Large Language Models

Keywords: Efficient Reasoning; RL for LLM; LLM Reasoning

Originally Submitted PDF: pdf

Submission Number: 10852

Loading