Towards Understanding Chain-of-Thought Prompting: An Empirical Study of What Matters

Boshi Wang; Sewon Min; Xiang Deng; Jiaming Shen; You Wu; Luke Zettlemoyer; Huan Sun

Towards Understanding Chain-of-Thought Prompting: An Empirical Study of What Matters

Boshi Wang, Sewon Min, Xiang Deng, Jiaming Shen, You Wu, Luke Zettlemoyer, Huan Sun

Published: 04 Mar 2023, Last Modified: 06 Jul 2025ME-FoMo 2023 PosterReaders: Everyone

Keywords: Chain-of-Thought prompting, Large Language Models, Multi-step Reasoning, In-context Learning

TL;DR: We find that the reasoning validity of the demonstrations matters little to the effectiveness of Chain-of-Thought prompting, and other aspects such as relevance to the query and correct orderings among the steps are the actual key.

Abstract: Chain-of-Thought (CoT) prompting, which encourages language models (LMs) to generate intermediate rationales for the final answer through in-context demonstrations, dramatically improves large LMs' ability to solve reasoning tasks. Despite its success, there is little understanding on what makes CoT prompting effective and which aspects of the demonstrated reasoning steps contribute to its performance. In this paper, we show that prompting with invalid demonstrations affects little in CoT reasoning, achieving over 80-90% of the performance obtained using the original CoT under various metrics, while still generating coherent lines of reasoning during inference. Further experiments show that other aspects of the rationales, such as being relevant to the query and correctly ordering the reasoning steps, are the actual key to the effectiveness of CoT. Overall, these findings deepen our understanding of CoT prompting, while leading to new questions regarding large LMs’ capability to learn to reason in context and reflections on benchmarking few-shot reasoning.

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 2 code implementations](https://www.catalyzex.com/paper/towards-understanding-chain-of-thought/code)

0 Replies

Loading