Agent Instructs Large Language Models to be General Zero-Shot Reasoners

Nicholas R Crispino; Kyle Montgomery; Fankun Zeng; Dawn Song; Chenguang Wang

Agent Instructs Large Language Models to be General Zero-Shot Reasoners

Nicholas R Crispino, Kyle Montgomery, Fankun Zeng, Dawn Song, Chenguang Wang

22 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX

Primary Area: transfer learning, meta learning, and lifelong learning

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Keywords: Language Models, Large Language Models, Reasoning, Agents, Chain of Thought

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.

TL;DR: We introduce a method to improve the zero-shot reasoning abilities of large language models on general language understanding tasks.

Abstract: We introduce a method to improve the zero-shot reasoning abilities of large language models on general language understanding tasks. Unlike existing zero-shot reasoning approaches that are often suboptimal for general tasks, we build an autonomous agent to generate task-specific instructions to optimize the reasoning performance of large language models. We show our agent instructions further unleash the zero-shot reasoning abilities of large language models to more tasks. We study the performance of our method on a wide set of datasets spanning generation, classification, and reasoning. We show that our method generalizes to most tasks and obtains state-of-the-art zero-shot performance on 20 of the 29 datasets that we evaluate. For instance, our method boosts the performance of state-of-the-art large language models by a large margin, including Vicuna-13b (13.3%), Llama-2-70b-chat (23.2%), and GPT-3.5 Turbo (17.0%). Compared to zero-shot chain of thought, our improvement in reasoning is striking, with an average increase of 10.4%. With our method, Llama-2-70b-chat outperforms zero-shot GPT-3.5 Turbo by 10.2%. The code is available at https://anonymous.4open.science/r/AgentInstruct_ICLR2024.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 5899

Loading