Abstract: Tool calling enables large language models (LLMs) to interact with external systems, such as APIs and databases, significantly enhancing their capabilities beyond text generation. This functionality is critical for applications like customer support, data analysis, and dynamic content generation. While recent advancements have improved LLM performance in tool invocation tasks, challenges persist, particularly with datasets that rely on simulated or inaccessible APIs and are often limited in geographical diversity. To address these issues, we introduce the International Tool Calling (ITC) dataset, designed specifically for international tool-calling scenarios. The ITC dataset includes 3,571 APIs and 17,540 tool calling tasks, with APIs covering 20 categories and extensive geographical representation from 40 countries. We propose a four-stage pipeline to construct the dataset, incorporating techniques such as bias sampling and tool fusion, and use advanced models to refine queries for high-quality tasks. Experimental results demonstrate significant performance variations between open-source and closed-source LLMs, highlighting the dataset’s potential to identify key strengths and weaknesses in tool-calling tasks. Additionally, fine-tuning open-source LLMs using the ITC dataset results in substantial performance improvements, both for in-distribution and out-of-distribution data. Our findings show that the ITC dataset provides a valuable resource for training LLMs in complex international and multiple tools contexts. The data is available at~\url{https://anonymous.4open.science/r/International-Tool-Calling-ITC-dataset-5FD7/}.
Paper Type: Long
Research Area: Language Modeling
Research Area Keywords: Language Modeling,NLP Applications
Contribution Types: Publicly available software and/or pre-trained models, Data resources, Data analysis
Languages Studied: English, Chinese, Japanese
Submission Number: 7515
Loading