Step-Tagging: Toward controlling the generation of Language Reasoning Models through step monitoring

Step-Tagging: Toward controlling the generation of Language Reasoning Models through step monitoring

ICLR 2026 Conference Submission17174 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Large Reasoning Models, Efficient inference, Monitoring Large Language Models, Interpretable Early-Stopping

TL;DR: We introduce the Step-Tagging framework to perform real-time annotation of the type of reasoning steps and develop interpretable early stopping criteria for dynamic adaptation of LRM inferences.

Abstract: The field of Language Reasoning Models (LRMs) has been very active over the past few years with advances in training and inference techniques enabling LRMs to reason longer, deeper, and more accurately. However, a growing body of studies show that LRMs are still inefficient, over-generating verification and self-reflection steps. To address this challenge, we introduce the Step-Tagging framework, a lightweight sentence-classifier enabling real-time annotation of the type of reasoning steps that an LRM is generating. To cover the wide space of reasoning behaviors, we introduced ReasonType: a novel taxonomy of reasoning steps. Building on this framework, we demonstrated that careful online monitoring of the count of specific steps can produce effective interpretable early stopping criteria of LRM inferences. We evaluate the Step-tagging framework on three open-source reasoning models across two standard benchmark datasets, MATH500 and GSM8K, and achieve 30 to 40% token reduction while maintaining comparable accuracy to standard generation. This work offers a novel way to increase control over the generation of LRMs, and a new tool to study behaviors of LRMs.

Supplementary Material: zip

Primary Area: foundation or frontier models, including LLMs

Submission Number: 17174

Loading