Morality is Contextual: Learning Interpretable Moral Contexts from Human Data with Probabilistic Clustering and Large Language Models

Morality is Contextual: Learning Interpretable Moral Contexts from Human Data with Probabilistic Clustering and Large Language Models

ICLR 2026 Conference Submission20140 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Moral Dilemmas, Context Learning, Large Language Models, Probabilistic Modeling, Explainable AI, AI Alignment, Computational Ethics

TL;DR: COMETH-RL learns moral contexts from human judgments using probabilistic clustering and LLM semantics, then predicts and explains new cases with interpretable features, achieving about 2x higher alignment than end-to-end prompting.

Abstract: Moral actions are judged not only by their outcomes but by the context in which they occur. We present \textsc{COMETH} (Contextual Organization of Moral Evaluation from Textual Human inputs), a framework that integrates a probabilistic context learner with LLM-based semantic abstraction and human moral evaluations to model how context shapes the acceptability of ambiguous actions. We curate an empirically grounded dataset of 300 scenarios across six core actions (violating \emph{Do not kill}, \emph{Do not deceive}, and \emph{Do not break the law}) and collect ternary judgments (Blame/Neutral/Support) from $N{=}101$ participants. A preprocessing pipeline standardizes actions via an LLM filter and MiniLM embeddings with K-means, producing robust, reproducible core-action clusters.\textsc{COMETH} then learns action-specific \emph{moral contexts} by clustering scenarios online from human judgment distributions using principled divergence criteria. To generalize and explain predictions, a Generalization module extracts concise, non-evaluative binary contextual features and learns feature weights in a transparent likelihood-based model. Empirically,\textsc{COMETH} roughly doubles alignment with majority human judgments relative to end-to-end LLM prompting ($\approx 60\%$ vs.\ $\approx 30\%$ on average), while revealing which contextual features drive its predictions. The contributions are: (i) an empirically grounded moral-context dataset, (ii) a reproducible pipeline combining human judgments with model-based context learning and LLM semantics, and (iii) an interpretable alternative to end-to-end LLMs for context-sensitive moral prediction and explanation.

Supplementary Material: pdf

Primary Area: interpretability and explainable AI

Submission Number: 20140

Loading