Supplementary Material: zip
Primary Area: representation learning for computer vision, audio, language, and other modalities
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: language agent, interactive NLP, tool-augmented LLM
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Abstract: In this paper, we present LUMOS, **L**anguage agents with **U**nified formats, **M**odular design, and **O**pen **S**ource LLMs. LUMOS features a modular architecture consisting of planning, grounding, and execution modules built based on open-source LLMs such as LLAMA-2. The planning module decomposes a task into a sequence of high-level subgoals; the grounding module then grounds the generated subgoals to a series of low-level actions that can then be executed by the execution module. To obtain high-quality annotations for training these modules, we leverage LLMs to convert ground-truth intermediate reasoning steps in existing benchmarks into a unified format that can be used in the LUMOS framework. LUMOS achieves competitive or superior performance compared to the state of the art on a variety of complex interactive tasks. We observe: (1) LUMOS is competitive with the LLM agents that are 2 − 4× larger on maths tasks, and outperforms GPT-4/3.5-based agents on complex QA and web agent tasks; (2) LUMOS shows superior performance against open-source agent baseline formulations including chain-of-thoughts fine-tuning and unmodularized training; (3) LUMOS surpasses larger LLM-based agents on an unseen interactive task, WebShop, and achieves 5-10 reward improvement over domain-specific agents.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 8483
Loading