In-Context Learning May Not Elicit Trustworthy Reasoning: A-Not-B Errors in Pretrained Language Models

Pengrui Han; Peiyang Song; Haofei Yu; Jiaxuan You

In-Context Learning May Not Elicit Trustworthy Reasoning: A-Not-B Errors in Pretrained Language Models

Pengrui Han, Peiyang Song, Haofei Yu, Jiaxuan You

Published: 18 Jun 2024, Last Modified: 26 Jul 2024ICML 2024 Workshop on LLMs and Cognition PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: A-not-B error, in-context learning, large language model failure cases, trustworthy llm reasoning

Abstract: Recent advancements in artificial intelligence (AI) have led to the development of highly capable large language models (LLMs) demonstrating significant human-like abilities. Yet these pretrained LLMs are often vulnerable to interesting cognitive biases. In this work, we study the A-Not-B error -- a developmental stage for human infants, characterized by the persistence of previously rewarded behavior despite changed conditions that warrant even trivial adaptation. Our investigation reveals that LLMs, akin to human infants, erroneously apply past successful responses to slightly altered contexts. Employing various reasoning tasks, we demonstrate that LLMs are susceptible to the A-Not-B error. Notably, smaller models exhibit heightened vulnerability, mirroring the developmental trajectory of human infants. Models pretrained with extensive, high-quality data show significant resilience, highlighting the importance of internal knowledge quality, similar to how rich experiences enhance human cognitive abilities. Furthermore, increasing the number of examples before a context change leads to more pronounced failures, highlighting that LLMs are fundamentally pattern-driven and may falter with minor, non-erroneous changes merely in patterns. We open source all code and results under a permissive MIT license, to encourage reproduction and further research exploration.

Submission Number: 58

Loading