FireAct: Toward Language Agent Finetuning

23 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: representation learning for computer vision, audio, language, and other modalities
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: language agent, language model, large language model, finetuning, agent, tool use
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
TL;DR: Fine-tuning language models for agents is understudied, so we present a systematic study.
Abstract: Recent efforts have augmented language models (LMs) with external tools or environments, leading to the development of language agents that can reason and act. However, most of these agents rely on few-shot prompting techniques, which can result in a lack of robustness in agent performance due to the limited learning support. In this paper, we investigate the less explored direction of fine-tuning LMs to obtain language agents. With a simple, controlled setup that uses a Google search API for question answering (QA), we systematically explore a variety of base LMs, agent methods, fine-tuning data, and QA tasks. Our experiments reveal novel insights around the scaling effects of the base LM and fine-tuning data, combining trajectory data collected from different tasks and different agent methods, as well as robustness to different types of data perturbations. Overall, these findings illustrate overlooked advantages of fine-tuned language agents over existing prompting-based ones, provide empirical guidelines for fine-tuning, and indicate future directions in creating better tasks and methods for language agents.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 6842
Loading