TPTU-v2: Boosting Task Planning and Tool Usage of Large Language Model-based Agents in Real-world Systems

Yilun Kong; Jingqing Ruan; YiHong Chen; Bin Zhang; Tianpeng Bao; shi shiwei; du guo qing; xiaoru hu; Hangyu Mao; Ziyue Li; Xingyu Zeng; Rui Zhao; Xueqian Wang

TPTU-v2: Boosting Task Planning and Tool Usage of Large Language Model-based Agents in Real-world Systems

Yilun Kong, Jingqing Ruan, YiHong Chen, Bin Zhang, Tianpeng Bao, shi shiwei, du guo qing, xiaoru hu, Hangyu Mao, Ziyue Li, Xingyu Zeng, Rui Zhao, Xueqian Wang

Published: 11 Mar 2024, Last Modified: 22 Apr 2024LLMAgents @ ICLR 2024 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: AI Agent, tool usage, task planning, real-world application

Abstract: Large Language Models (LLMs) have demonstrated proficiency in addressing tasks that necessitate a combination of task planning and the usage of external tools, such as weather and calculator APIs. However, real-world complex systems present three prevalent challenges concerning task planning and tool usage: (1) The real system usually has numerous APIs, so it is impractical to feed the descriptions of all APIs to the prompt of LLMs as the token length is limited; (2) the real system is designed for handling complex tasks, and the base LLMs can hardly plan a correct sub-task order and API-calling order for such tasks; (3) Similar semantics and functionalities among APIs in real systems create challenges for both LLMs and even humans in distinguishing between them. In response, this paper introduces a comprehensive framework aimed at enhancing the Task Planning and Tool Usage (TPTU) abilities of LLM-based agents within real-world systems. Our framework comprises three key components designed to address these challenges: (1) the API Retriever selects the most pertinent APIs among the extensive API set; (2) LLM Finetuner tunes a base LLM to enhance its capability for task planning and API calling; (3) the Demo Selector retrieves demonstrations related to hard-to-distinguish APIs, which is further used for in-context learning to boost the final performance. We validate our methods using a real-world industry system and an open-sourced academic dataset, demonstrating the efficacy of each individual component as well as the integrated framework.

Submission Number: 47

Loading