How Large Language Models Implement Chain-of-Thought?

Yiqun Wang; Sile Hu; Yonggang Zhang; Xiang Tian; Xuesong Liu; Yaowu Chen; Xu Shen; Jieping Ye

How Large Language Models Implement Chain-of-Thought?

Yiqun Wang, Sile Hu, Yonggang Zhang, Xiang Tian, Xuesong Liu, Yaowu Chen, Xu Shen, Jieping Ye

22 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX

Primary Area: visualization or interpretation of learned representations

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Keywords: Model Interpretability, Large Language Models, Transformers, Circuit Analysis

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.

TL;DR: We find the critical attention heads for LLM completing multi-step reasoning tasks using CoT prompting and find that model's reasoning ability is impaired when knocking out the critical heads.

Abstract: Chain-of-thought (CoT) prompting has showcased the significant enhancement in the reasoning capabilities of large language models (LLMs). Unfortunately, the underlying mechanism behind how CoT prompting works remains elusive. Advanced works show the possibility of revealing the reasoning mechanism of LLMs by leveraging counterfactual examples (CEs) to do a causal intervention. Specifically, analyzing the difference between effects caused by original examples (OEs) and CEs can identify the key attention heads related to the ongoing task, e.g., a reasoning task. However, the completion of reasoning tasks involves diverse abilities of language models such as numerical computation, knowledge retrieval, and logical reasoning, posing challenges to constructing proper CEs. In this work, we propose an in-context learning approach to construct the pair of OEs and CEs, where OEs can activate the reasoning behavior and CEs are similar to OEs but without activating the reasoning behavior. To accurately locate the key heads, we further propose a word of interest (WOI) normalization approach to focus on specific words related to the ground-truth answer. Our empirical observations show that only a small fraction of attention heads contribute to the reasoning task, primarily located in the middle and upper layers of LLMs. Intervention with these identified heads can significantly hamper the model's performance on reasoning tasks. Among these heads, we found that some play a key role in judging for final answer, some play a key role in synthesizing the step-by-step thoughts to get answers, which corresponds to the two stages of the chain-of-thought (CoT) process: firstly think step-by-step to get intermediate thoughts, then answer the question based on these thoughts.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.

Supplementary Material: pdf

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 4909

Loading