AsyncTool: Evaluating the Asynchronous Function Calling Capability under Multi-Task Scenarios

Kou Shi; Ziao Zhang; Shiting Huang; Avery Nie; Zhen Fang; Qiuchen Wang; Lin Chen; Huaian Chen; Zehui Chen; Feng Zhao

AsyncTool: Evaluating the Asynchronous Function Calling Capability under Multi-Task Scenarios

Kou Shi, Ziao Zhang, Shiting Huang, Avery Nie, Zhen Fang, Qiuchen Wang, Lin Chen, Huaian Chen, Zehui Chen, Feng Zhao

05 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: asynchronous tool call;benchmark

Abstract: Large language models (LLMs) based agents have demonstrated strong proficiency in leveraging external tools to address complex problems. However, existing evalu- ations largely overlook the temporal dimension of tool invocation, particularly the practical impact of inherent tool response latency, and they are typically confined to single-task scenarios. In realistic applications, tasks often need to be executed in parallel, and overall efficiency critically depends on the ability to utilize idle time during tool response delays. We denote this capability as asynchronous tool calling. To address the lack of evaluation in this area, we propose ASYNCTOOL, which, to the best of our knowledge, is the first benchmark specifically aimed at assessing the asynchronous multitasking abilities of LLM-based agents within interactive tool-use contexts. ASYNCTOOL consists of composite tasks with intra-task step dependencies that must be executed concurrently while incorporating realistic tool response delays. Through a hybrid data evolution strategy, we construct a diverse and representative asynchronous multitasking dataset that covers multiple scenarios and exhibits a wide range of tool use patterns. We further assess performance from three levels, namely Step Level, Sub-Task Level, and Task Level, covering perspec- tives from fine-grained to coarse-grained. Extensive experiments on ASYNCTOOL show that even state of the art models experience notable performance degradation when confronted with complex asynchronous workflows. Our analysis identifies the main failure modes of current tool agents and provides practical guidelines for designing future systems with stronger temporal reasoning and coordination capabilities.

Primary Area: datasets and benchmarks

Submission Number: 2352

Loading