Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: large language models, tool making, tool using, serving efficiency
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Abstract: Recent research has highlighted the potential of large language models (LLMs)
to improve their problem-solving capabilities with the aid of suitable external
tools. In our work, we further advance this concept by introducing a closed-
loop framework, referred to as LLMs A s Tool Makers (LATM), where LLMs
create their own reusable tools for problem-solving. Our approach consists of two
phases: 1) tool making: an LLM acts as the tool maker that crafts tools for a set
of tasks, where a tool is implemented as a Python utility function. 2) tool using:
another LLM acts as the tool user, which applies the tool built by the tool maker
for problem-solving. The tool user can be either the same or a different LLM
from the tool maker. On the problem-solving server side, tool-making enables
continual tool generation and caching as new requests emerge. This framework
enables subsequent requests to access cached tools via their corresponding APIs,
enhancing the efficiency of task resolution. Beyond enabling LLMs to create their
own tools, our framework also uncovers intriguing opportunities to optimize the
serving cost of LLMs: Recognizing that tool-making requires more sophisticated
capabilities, we assign this task to a powerful, albeit resource-intensive, model.
Conversely, the simpler tool-using phase is delegated to a lightweight model. This
strategic division of labor allows the once-off cost of tool-making to be spread
over multiple instances of tool-using, significantly reducing average costs while
maintaining strong performance. Furthermore, our method offers a functional
cache through the caching and reuse of tools, which stores the functionality of
a class of requests instead of the natural language responses from LLMs, thus
extending the applicability of the conventional cache mechanism. We evaluate
our approach across various complex reasoning tasks, including Big-Bench tasks.
With GPT-4 as the tool maker and GPT-3.5 as the tool user, LATM demonstrates
performance equivalent to using GPT-4 for both roles, but with a significantly
reduced inference cost.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Primary Area: general machine learning (i.e., none of the above)
Submission Number: 4344
Loading