ToolRM: Outcome Reward Models for Tool-Calling Large Language Models

Mayank Agarwal; Ibrahim Abdelaziz; Kinjal Basu; Merve Unuvar; Luis A. Lastras; Yara Rizk; Pavan Kapanipathi

ToolRM: Outcome Reward Models for Tool-Calling Large Language Models

Mayank Agarwal, Ibrahim Abdelaziz, Kinjal Basu, Merve Unuvar, Luis A. Lastras, Yara Rizk, Pavan Kapanipathi

18 Sept 2025 (modified: 05 Jan 2026)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Reward Modeling, Large Language Models, Tool Use

Abstract: As large language models (LLMs) increasingly interact with external tools, reward modeling for tool use has become a critical yet underexplored area. Existing reward models, trained primarily on natural language outputs, struggle to evaluate tool-based reasoning and execution. To quantify this gap, we introduce FC-RewardBench, the first benchmark designed to systematically evaluate reward models in tool-calling scenarios. Our analysis shows that current reward models often miss key signals of effective tool use, highlighting the need for domain-specific modeling. To address this, we propose a training framework for outcome reward models using data synthesized from permissively licensed, open-weight LLMs. We train models ranging from 1.7B to 14B parameters and evaluate them across seven out-of-domain benchmarks. These reward models consistently outperform general-purpose baselines, yielding up to a 25% average improvement in downstream task performance, enhancing robustness to input noise, and enabling data-efficient fine-tuning through reward-guided filtering.

Primary Area: foundation or frontier models, including LLMs

Submission Number: 14211

Loading