When Humans Revise Their Beliefs, Explanations Matter: Evidence from User Studies and What It Means for AI Alignment
Keywords: cognitive modeling, belief revision, human-subject studies, AI Alignment
Abstract: Understanding how humans revise their beliefs in light of new information is crucial for developing AI agents which can effectively model, and thus align with, human reasoning and decision-making. Motivated by empirical evidence from cognitive psychology, in this paper we first present three comprehensive human-subject studies showing that people consistently prefer explanation-based revisions, i.e., those which are guided by explanations, that result in changes to their belief agents that are more extensive than necessary. Our experiments systematically investigate how people revise their beliefs with explanations for inconsistencies, whether they are provided with them or left to formulate them themselves, demonstrating a robust preference for what may seem non-minimal revisions across different types of scenarios. Moreover, we evaluate to what extent large language models can simulate human belief revision patterns by testing state-of-
the-art models on parallel tasks, analyzing their revision choices and alignment with human preferences. These findings have implications for AI agents designed to model and interact with humans, suggesting that such agents should accommodate explanation-based, potentially non-minimal belief revision operators to better align with human cognitive processes.
Paper Type: New Full Paper
Supplementary Material: pdf
Submission Number: 1
Loading