ToolCPT: Improving Tool Utilization in LLM Agents via Continuous Pre-training

ToolCPT: Improving Tool Utilization in LLM Agents via Continuous Pre-training

ACL ARR 2026 January Submission1740 Authors

31 Dec 2025 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: LLM-based Agent, Tool Knowledge, Continuous Pre-training

Abstract: Autonomous agents powered by large language models (LLM-based agents) are capable of using off-the-shelf tools to interact with the environment, solve real-world problems, and boost work efficiency. However, current approaches to enhancing tool use for LLM-based agents primarily focus on post-training fine-tuning or test-time context extension. These methods overlook the fundamental tool knowledge acquisition during the early training phase, where models actually learn and internalize core knowledge representations, restricting model performance on out-of-distribution tool usage. To solve such a problem, we introduce enhancing \textbf{tool} knowledge for LLM-based agents during \textbf{c}ontinuous \textbf{p}re-\textbf{t}raining (\textbf{ToolCPT}). We identify and bridge a key gap in current LLM training by shifting focus from tool-calling patterns to deep internalization of core tool-knowledge representations. We begin by curating 5.1 million code artifacts from large-scale, high-quality code repositories. These artifacts are selected based on a set of criteria that defines a usable "proxy agent tool", thereby forming a comprehensive agent tool library. For each proxy tool, we then create a detailed playbook covering implementation specifications, core functionalities, interaction protocols with other tools, and illustrative positive and negative examples. This process yields a large-scale tool knowledge corpus comprising 18 billion tokens, which is used to continuously pre-train our model. Experiments show our playbook-enhanced corpus catalyzes deep knowledge internalization, driving the model to notable performance gains on multiple standard benchmarks.

Paper Type: Long

Research Area: AI/LLM Agents

Research Area Keywords: Dialogue and Interactive Systems, Language Modeling,

Contribution Types: Data resources

Languages Studied: English, Chinese

Submission Number: 1740

Loading