JTPRO: A Joint Tool–Prompt Reflective Optimization Framework for Language Agents

JTPRO: A Joint Tool–Prompt Reflective Optimization Framework for Language Agents

ACL ARR 2026 January Submission8532 Authors

06 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Tool-using LLM agents, tool selection, slot filling, tool schema optimization, prompt optimization, context optimization, reflection-based optimization, black-box LLMs, large tool inventories, retrieval-augmented tool selection, multi-tool calling, adaptive toolset expansion, call-level evaluation, Pareto selection, instruction–schema co-adaptation

Abstract: Large language model (LLM) agents augmented with external tools often struggle as number of tools grow large and become domain-specific. In such settings, ambiguous tool descriptions and under-specified agent instructions frequently lead to tool mis-selection and incorrect slot/value instantiation. we hypothesize that this is due to two root causes: generic, one-size-fits-all prompts that ignore tool-specific nuances, and underspecified tool schemas that lack clear guidance on when and how to use each tool and how to format its parameters. We introduceJoint Tool–Prompt Reflective Optimization (JTPRO), a framework that iteratively uses rollout-driven reflection to co-optimize global instructions and per-tool schema/argument descriptions. This is based on tool-confusion and slot/for-matting errors. JTPRO is designed to preserve only tool-local cues needed for correct disambiguation and slot filling. We evaluate JTPRO across multi-tool benchmarks, which account for different number of tools using three metrics: Tool Selection Accuracy (TSA), Slot Filling Accuracy(SFA), and Overall Success Rate(OSR) (correct tool + correct slots + correct values). JTPRO consistently outperforms strong baselines, including CoT-style agents, and prompt optimizers such as GEPA by 5\%–20\% (relative) on OSR. Ablations show that joint optimization of instructions and tool schemas is more effective and robust than optimizing either component in isolation.

Paper Type: Long

Research Area: AI/LLM Agents

Research Area Keywords: LLM agents, tool use, function calling

Contribution Types: NLP engineering experiment, Data analysis

Languages Studied: English

Submission Number: 8532

Loading