The Character of Confabulation: Operationalizing a Clinical Typology for Reasoning-Mode Language Models

Parichaye Grover; Vivek Kumar Sehgal

The Character of Confabulation: Operationalizing a Clinical Typology for Reasoning-Mode Language Models

Parichaye Grover, Vivek Kumar Sehgal

Published: 03 Jun 2026, Last Modified: 03 Jun 2026AI4GOOD Workshop 2026 RegularEveryoneRevisionsBibTeXCC BY 4.0

Keywords: large language models, hallucination, confabulation, reasoning models, evaluation, character profiling, Kopelman typology, DeepSeek-R1, TriviaQA

TL;DR: We apply Kopelman's 1987 clinical confabulation typology to a reasoning-tuned language model and find that turning reasoning on transforms the model from a refuser (69%) into a confabulator (66%) without meaningfully improving accuracy.

Abstract: Language model benchmarks tell us how often a model gets the answer right. They do not tell us what kind of failure produces the rest. A model that refuses two-thirds of questions and a model that confidently fabricates two-thirds of its responses have the same accuracy. They do not have the same character. We borrow a framework from clinical psychology to make the difference visible. Kopelman's 1987 study of confabulation in Korsakoff and Alzheimer patients distinguished spontaneous (confident, elaborate, internally coherent fabrication) from provoked (briefer wrong answers elicited by direct questions). We turn his three behavioral features into three computable measurements: token-level entropy as a proxy for delivery confidence, an elaboration ratio as a proxy for narrative scaffolding, and the gap between self-reported and actual confidence as a proxy for failed self-monitoring. We apply these to 200 generations from DeepSeek-R1-Distill-Qwen-1.5B on TriviaQA, comparing reasoning-enabled and reasoning-disabled conditions. The resulting six-category profile reaches κ = 0.71 against 50 manually labeled examples — substantial agreement. With reasoning enabled, the model attempts every question and confabulates 66% of them (29% spontaneous, 37% provoked); with reasoning disabled, the same model refuses 69% of the time. Accuracy moves from 3% to 17%; character moves from refuser to confabulator. The framework does not detect a failure mode that benchmarks miss; it names a structural shift that benchmarks aggregate.

Email Sharing: We authorize the sharing of all author emails with Program Chairs.

Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.

Submission Number: 217

Loading