Mission Impossible: Universal Moral Alignment

Published: 02 Jun 2026, Last Modified: 09 Jun 2026Pluralistic-Alignment 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Alignment, Pluralistic, Impossibilities
Abstract: Universal moral alignment for large language models (LLMs) is often framed as the goal of learning a single policy that behaves in accordance with human values. This framing assumes that sufficiently capable models can approximate a coherent and universally valid moral objective. We argue that this assumption is false in pluralistic settings. Drawing on a preference-learning view of alignment and insights from social choice theory, we show that when different groups hold internally coherent but conflicting moral judgments over the same context-action pairs, no non-degenerate single policy can satisfy all groups simultaneously. Conditioning on group membership can mitigate some cross-group conflicts, but shifts the problem to group construction, overlapping memberships, and within-group disagreement. Under stronger forms of disagreement, aggregation can even produce policies that are misaligned with every group. We outline a constructive agenda that replaces universal moral alignment with pluralistic and procedurally explicit alternatives, including normative governance mechanisms, impossibility-aware evaluation, and richer representations of human preferences that make disagreement visible rather than averaging it away.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 54
Loading