Abstract: Large Language Model (LLM) agents are increasingly applied to complex, multi-step tasks that require interaction with diverse external tools across domains such as mathematics, vision, and knowledge retrieval. However, current frameworks typically rely on greedy, reactive tool selection strategies that lack foresight and fail to account for inter-tool dependencies—limiting their effectiveness on challenging, sequential workflows.
In this paper, we present a generalizable agent framework that integrates a plug-and-play Monte Carlo Tree Search (MCTS) module for deliberate tool selection. Our method explores possible tool usage trajectories using a dual-stage LLM evaluation mechanism: a pre-execution model estimates the utility of a tool in context, while a post-execution model assesses its actual contribution based on observed outcomes. This feedback loop enables the agent to make informed, adaptive decisions over extended tool-use sequences.
To ensure broad applicability, we introduce standardised “tool cards” that encapsulate domain-specific models for mathematics, vision, medicine, document analysis, and knowledge retrieval, enabling seamless orchestration across multiple domains. Empirical evaluations across 15 tasks demonstrate that our framework consistently improves downstream performance, achieving an average gain of over 5\% compared to state-of-the-art agent systems.
Paper Type: Long
Research Area: Language Modeling
Research Area Keywords: LLM/AI Agent, tool use for LLM, retrieval-augmented generation
Contribution Types: Model analysis & interpretability, NLP engineering experiment
Languages Studied: English
Submission Number: 6533
Loading