Large Language Models are Zero-Shot ReasonersDownload PDF

01 Jun 2022 (modified: 22 Oct 2023)ICML 2022 Workshop KRLM Readers: Everyone
Keywords: chain of thought (CoT), zero-shot learning, multi-step reasoning, arithmetic, commonsense reasoning, prompting, large language models (LLMs)
TL;DR: We propose a single zero-shot prompt that elicits effective chain of thought reasoning across diverse benchmarks that require multi-step thinking.
Abstract: Chain of thought (CoT) prompting, a recent technique for eliciting multi-step reasoning through step-by-step answer examples, achieved the state-of-the-art performances in arithmetics and symbolic reasoning.While these successes are often attributed to LLMs' ability for few-shot learning, we show that LLMs are decent zero-shot reasoners by simply adding ``Let's think step by step'' before each answer. Experimental results demonstrate that our Zero-shot-CoT, using the same single prompt template, significantly outperforms zero-shot LLM performances on diverse benchmark reasoning tasks including arithmetics (MultiArith, GSM8K, AQUA-RAT, SVAMP), symbolic reasoning (Last Letter, Coin Flip), and other logical reasoning tasks(Date Understanding, Tracking Shuffled Objects), without any hand-crafted few-shot examples, e.g. increasing the accuracy on MultiArith from 17.7% to 78.7% and GSM8K from 10.4% to 40.7% with an 175B parameter Instruct-GPT, as well as similar magnitudes of improvements with 540B parameter PaLM. The versatility of this single prompt across very diverse reasoning tasks hints at untapped and understudied fundamental zero-shot capabilities of LLMs, suggesting high-level, multi-task broad cognitive capabilities may be extracted through simple prompting. We hope our work not only serves as the minimal strongest zero-shot baseline for the challenging reasoning benchmarks, but also highlights the importance of carefully exploring and analyzing the enormous zero-shot knowledge hidden inside LLMs.
Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 8 code implementations](https://www.catalyzex.com/paper/arxiv:2205.11916/code)
0 Replies

Loading