TATTO: Tool-Augmented Thinking PRM for Tabular Reasoning

Published: 23 Sept 2025, Last Modified: 07 Dec 2025FoRLM 2025EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Tabular Reasoning, Process Reward Model, Tool Integration, Test-time Scaling
TL;DR: A tool-augmented process reward model that improves tabular reasoning at test time.
Abstract: Test-time scaling has emerged as a promising paradigm to enhance reasoning in large reasoning models by allocating additional inference-time compute. However, its potential for tabular reasoning remains underexplored. We identify that existing process reward models, widely used to supervise reasoning steps, struggle with table-specific operations such as table retrieval and schema interaction, leading to bottlenecked performance under TTS. To address this gap, we propose TATTO, the first table-grounded PRM framework that leverages tool use for accurate verification. We develop a scalable data curation pipeline producing over 60k high-quality step-level annotations that combine expert rationales with programmatic tool executions, and train our tabular PRM via supervised fine-tuning followed by reinforcement learning with tool-grounded reward shaping. We provide both theoretical analyses and empirical evaluations on the efficacy of our method. Across five challenging tabular reasoning benchmarks, our TATTO-8B PRM achieves an average 30.9\% relative gain over the base LRM, consistently surpasses strong baselines such as Qwen-2.5-Math-PRM-72B with up to 9× parameter efficiency, and generalizes robustly across multiple TTS strategies.
Submission Number: 175
Loading