An In-Context Learning Theoretic Analysis of Chain-of-Thought

Published: 18 Jun 2024, Last Modified: 14 Jul 2024ICML 2024 Workshop ICL PosterEveryoneRevisionsBibTeXCC BY 4.0
Track: long paper (up to 8 pages)
Keywords: in-context learning, large language models, chain-of-thought, learning theory
Abstract: Large language models (LLMs) have demonstrated remarkable reasoning capabilities with proper prompting strategies such as by augmenting demonstrations with chain-of-thought (CoT). However, the understanding of how different intermediate steps in the CoT improve reasoning and the principles guiding their design remains elusive. This paper takes an initial step towards addressing these questions by introducing a new analytical framework from a learning theoretic perspective. Particularly, we identify a class of in-context learning (ICL) algorithms on few-shot CoT prompts, capable of learning complex non-linear functions by composing simpler predictors obtained through gradient descent based optimization. We show this algorithm can be expressed by Transformers in their forward pass with simple weight constructions. We further analyse of the generalization properties of the ICL algorithm for learning different families of target functions. The derived theoretical results suggest several provably effective ways for decomposing target problems and forming CoT prompts, highlighting the bottleneck lies at the hardest reasoning step. Empirically, we demonstrate that CoT forms derived from our theoretical insights significantly enhance the reasoning capabilities of real-world LLMs in solving challenging arithmetic reasoning tasks.
Submission Number: 20
Loading