Rethinking Diversity-Preserving RL for Pluralistic Alignment: Empirical Evidence from Rubric-Grounded Moral Reasoning

Published: 02 Jun 2026, Last Modified: 02 Jun 2026Pluralistic-Alignment 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: LLM alignment, Moral reasoning, Reinforcement learning with verifiable rewards
TL;DR: This paper argues that moral reasoning does not inherently require diversity-preserving RL: under rubric-grounded alignment rewards apparent moral pluralism does not necessarily translate into a multi-modal high-reward landscape.
Abstract: Pluralistic alignment is often associated with preserving diverse high-reward responses, especially in moral reasoning where multiple answers may be defensible under different value systems. This paper studies that assumption in a rubric-grounded reinforcement learning with verifiable rewards (RLVR) setting. Using MoReBench, we compare representative reward-maximizing methods and a distribution-matching baseline under a shared training and evaluation pipeline enabled by a distilled local judge. Across two model families and two moral-reasoning subtasks, reward-maximizing methods match or outperform the distribution-matching baseline. Semantic visualization and qualitative case analysis further suggest that, under current rubric-grounded rewards, high-reward moral-reasoning responses are often more concentrated than the surface pluralism of the task might suggest. These results do not imply that diversity is unimportant in alignment. Rather, they indicate that the need for diversity-preserving RL should be established empirically from the evaluator-induced reward landscape. For pluralistic alignment, this shifts attention from domain-level intuitions alone toward the joint role of benchmark design, reward definition, and optimization objective.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 58
Loading