Theory-Grounded Evaluation of Human-Like Fallacy Patterns in LLM Reasoning

Theory-Grounded Evaluation of Human-Like Fallacy Patterns in LLM Reasoning

ICLR 2026 Conference Submission18241 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Language models, reasoning, synthetic data, contamination-proof, human-like errors, cognitive fallacies, Erotetic Theory of Reasoning, PyETR, logical fallacies, human-like reasoning patterns, reasoning evaluation, question-driven inference, inverse scaling laws, human cognition, rationality vs fallibility, cognitive biases, order effects, reasoning benchmarks, cognitive science alignment, AI alignment, systematic deviations from logic, normative vs descriptive reasoning, reasoning tasks, disjunction fallacy, modus ponens, modus tollens, syllogistic inference, logical validity, data contamination, natural language reasoning tasks, formal semantics, mental models, evaluation harness, Chatbot Arena, medical diagnosis, legal reasoning, high-stakes decision-making, alignment benchmarks, robust reasoning systems, AI evaluation frameworks

TL;DR: Language models of different strengths show shifting patterns in how they make mistakes, with stronger models’ errors more often resembling predictable human reasoning slips.

Abstract: We study logical reasoning in language models by asking whether their errors follow established human fallacy patterns. Using the Erotetic Theory of Reasoning (ETR) and its open‑source implementation, PyETR, we programmatically generate 383 formally specified reasoning problems and evaluate 38 models. For each response, we judge logical correctness and, when incorrect, whether it matches an ETR‑predicted fallacy. Two results stand out: (i) as a capability proxy (Chatbot Arena Elo) increases, a larger share of a model’s incorrect answers are ETR‑predicted fallacies ($\rho=0.360, p=0.0265$), while overall correctness on this dataset shows no correlation with capability; (ii) reversing premise order significantly reduces fallacy production for many models, mirroring human order effects. Methodologically, PyETR provides an open‑source pipeline for unbounded, synthetic, contamination‑resistant reasoning tests linked to a cognitive theory, enabling analyses that focus on error composition rather than error rate.

Supplementary Material: zip

Primary Area: datasets and benchmarks

Submission Number: 18241

Loading