Tracing the Traces: Latent-Space Metrics for Efficient and Accurate Reasoning

Published: 23 Sept 2025, Last Modified: 07 Dec 2025FoRLM 2025EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Reasoning Models, Test-Time Scaling, Mechanistic Interpretability, Representational Analysis
Abstract: Reasoning models rely on inference-time scaling, allocating more compute via longer token budgets to improve problem-solving. Identifying traces that reliably lead to correct answers is a key step toward improving the reliability and efficiency of these models. In this work, we propose Latent-Space Metrics that track the shifts in internal representations during the generation of intermediate reasoning tokens. We introduce a set of trajectory metrics that quantify both the magnitude of hidden-state changes and the geometry of their trajectories along the reasoning trace. We show that metrics tracking the model’s internal states, rather than its output tokens, can serve as strong predictors of final answer accuracy. Our results demonstrate that they consistently distinguish correct from incorrect traces across models and reasoning domains. Moreover, we show that they enable more effective and efficient test-time scaling strategies, reducing token usage by up to 70% while preserving and even improving accuracy by 2.6% on average.
Submission Number: 191
Loading