Towards Cognitively-Faithful Decision-Making Models to Improve AI Alignment

Towards Cognitively-Faithful Decision-Making Models to Improve AI Alignment

ICLR 2026 Conference Submission22317 Authors

20 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Human-Centric AI, Moral Preference Elicitation, Axiomatic Analysis, Interpretable Machine Learning

TL;DR: We propose a new way to align AI with human decision-making by modeling the cognitive processes behind choices, with an axiomatic approach: features are processed with learned rules, then aggregated with a fixed rule e.g., Bradley-Terry.

Abstract: Recent AI trends seek to align AI models to learned human-centric objectives, such as personal preferences, utility, or societal values. Using standard preference elicitation methods, researchers and practitioners build models of human decisions and judgments, to which AI models are aligned. However, standard elicitation methods often fail to capture the true cognitive processes behind human decision making, such as the use of heuristics or simplifying structured thought patterns. To address this limitation, we take an axiomatic approach to learning cognitively faithful decision processes from pairwise comparisons. Building on the vast literature characterizing cognitive processes that contribute to human decision-making and pairwise comparisons, we derive a class of models in which individual features are first processed with learned rules, then aggregated via a fixed rule, such as the Bradley-Terry rule, to produce a decision. This structured processing of information ensures that such models are realistic and feasible candidates to represent underlying human decision-making processes. We demonstrate the efficacy of this modeling approach by learning interpretable models of human decision making in a kidney allocation task, and show that our proposed models match or surpass the accuracy of prior models of human pairwise decision-making.

Supplementary Material: pdf

Primary Area: alignment, fairness, safety, privacy, and societal considerations

Submission Number: 22317

Loading