A Peek into Token Bias: Large Language Models Are Not Yet Genuine Reasoners

Published: 18 Jun 2024, Last Modified: 26 Jul 2024ICML 2024 Workshop on LLMs and Cognition PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: reasoning capability, llm reasoning, large language models, token bias, hypothesis testing, logical fallacy
TL;DR: This study proposes a hypothesis-testing framework to determine whether large language models possess genuine reasoning abilities or rely on token bias.
Abstract: This study proposes a hypothesis-testing framework to determine whether large language models (LLMs) possess genuine reasoning abilities or rely on token bias. Carefully-controlled synthetic datasets are generated, and null hypotheses assuming LLMs' reasoning capabilities are tested with statistical guarantees. Inconsistent behavior during experiments leads to the rejection of null hypotheses. Our findings, using the conjunction fallacy as a quintessential example, suggest that current LLMs still struggle with probabilistic reasoning, with apparent performance improvements largely attributable to token bias.
Submission Number: 10
Loading