CORE – Cognitive Observation of Reasoning Errors

Published: 23 Sept 2025, Last Modified: 17 Feb 2026CogInterp @ NeurIPS 2025 RejectEveryoneRevisionsBibTeXCC BY 4.0
Keywords: large language models; cognitive biases; decision-making; bias measurement; behavioral evaluation; debiasing
TL;DR: We introduce a multi-paradigm evaluation showing that different large language models exhibit distinct patterns of human-like cognitive biases, with public tools for replication and debiasing.
Abstract: Large language models (LLMs) can exhibit systematic judgment patterns akin to human cognitive biases. We evaluate two instruction-following chat models on nine paradigms—anchoring, framing, defaults, decoys, bandwagon, premise-order effects, conjunction fallacy, endowment effect, and sunk cost—using 221 content-diverse item pairs and 111 completions per condition (∼ 49,000 responses per model). Bias contrasts are estimated with appropriate statistical tests (Welch t, Wald z, Wilson intervals) to produce “bias fingerprints.” Results show a double dissociation: one model mirrors heuristic biases (e.g., framing, anchoring) while the other resists them but is more susceptible to structural manipulations (defaults, decoys) and premise order. We release all materials to support replication and mechanistic follow-ups, advocating multi-paradigm batteries for characterizing and debiasing LLM decision behavior.
Submission Number: 4
Loading