An Evaluation Mechanism of LLM-based Agents on Manipulating APIs

ACL ARR 2024 June Submission5041 Authors

16 Jun 2024 (modified: 03 Jul 2024)ACL ARR 2024 June SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: LLM-based agents can greatly extend the abilities of LLMs and thus attract sharply increased studies. An ambitious vision -- serving users by manipulating massive API-based tools -- has been proposed and explored. However, we find a widely accepted evaluation mechanism for generic agents is still missing. This work aims to fill this gap. We decompose tool use capability into seven aspects and form a thorough evaluation schema. In addition, we design and release an instruction dataset and a toolset -- the two sides that the agents bridge between -- following the principle of reflecting real-world challenges. Furthermore, we evaluate multiple generic agents. Our findings can inspire future research in improving LLM-based agents and rethink the philosophy of API design.
Paper Type: Long
Research Area: Resources and Evaluation
Research Area Keywords: NLP datasets, evaluation methodologies, evaluation
Contribution Types: Publicly available software and/or pre-trained models, Data resources
Languages Studied: english
Submission Number: 5041
Loading