Keywords: Large Language Models, Consistency, Morality, Evaluation, Value Pluralism
TL;DR: Current LLM benchmarks treat moral consistency as an unqualified good, but since departures can be justified and desirable, we should instead aim for reasonable inconsistency—transparent deviations grounded in recognizable moral reasons.
Abstract: It is a widely held assumption that Large Language Models should be morally consistent. In this position paper, we critically analyze this assumption. We disambiguate six distinct notions of moral consistency and show that, for each, there exist cases where inconsistency can be both justified and desirable. Building on this analysis, we propose that LLMs should instead be $\textit{reasonably}$ morally inconsistent: consistency should be treated as a desirable but ultimately defeasible norm, with deviations permitted when $\textit{justified}$ by recognizable moral or contextual reasons that are made $\textit{transparent}$. We argue that recent benchmarks, by treating moral consistency as an unqualified good, are misguided and potentially counterproductive. As an alternative, we point towards pluralistic and process-focused alignment, and sketch a concrete benchmark format that aims to better accommodate the legitimate role of inconsistency in moral behavior and thought.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 31
Loading