TRACE: A Framework for Analyzing and Enhancing Stepwise Reasoning in Vision-Language Models

Published: 17 Oct 2025, Last Modified: 21 Nov 2025MATH-AI 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: vision-language models, reasoning evaluation, error localization, interpretability, mathematical reasoning, scientific reasoning
TL;DR: TRACE provides fine-grained evaluation of vision–language models by decomposing reasoning into interpretable steps, enabling error detection and localization of failure points.
Abstract: Reliable mathematical and scientific reasoning remains an open challenge for large vision–language models (VLMs). Standard final-answer evaluation often masks reasoning errors, allowing silent failures to persist. To address this gap, we introduce TRACE (Transparent Reasoning And Consistency Evaluation), a framework for analyzing, diagnosing, and improving reasoning in VLMs. At its core, TRACE leverages Auxiliary Reasoning Sets (ARS), compact sub-question–answer pairs that decompose complex problems, evaluate intermediate steps through consistency-based metrics, and expose failures overlooked by standard evaluation. Our experiments show that consistency across ARS is linked to final-answer correctness and helps pinpoint the reasoning steps where failures arise, offering actionable signals for model improvement.
Submission Number: 46
Loading