Position: Aggregate Preference Optimization Hides a Posterior Identifiability Failure for Pluralistic Alignment

Published: 02 Jun 2026, Last Modified: 02 Jun 2026Pluralistic-Alignment 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: pluralistic alignment, RLHF, posterior identifiability, multi-stakeholder preferences, Bayesian workflow, position paper
TL;DR: Aggregate-RLHF cannot identify subgroup preferences. Aggregate fit MSE 0.008 hides 8x within-subgroup MSE; masking ratio grows from 1.08 to 4.32 as diversity grows. Need 4-item disclosure standard.
Abstract: RLHF and related preference-optimization methods produce a single reward model from aggregate pairwise feedback collected across diverse users. We argue that this aggregation hides a structural posterior identifiability failure: when subgroups hold conflicting preferences, the joint posterior over (subgroup composition, within-subgroup preference) is not identifiable from aggregate pairwise data alone, regardless of sample size. The aggregate-fit reward model can achieve low training loss while assigning systematically incorrect predictions to within-subgroup preferences. We support the position with two empirical demonstrations on a Bradley-Terry preference model with K=2 subgroups holding opposing preferences over five attributes: under high-conflict subgroups, the aggregate-fit model collapses to a near-zero theta vector with 0.008 aggregate MSE but 0.06-0.07 within-subgroup prediction MSE (an 8x gap), and a sweep over subgroup diversity sigma_groups in [0, 3] shows the masking ratio (within-subgroup error / aggregate error) growing from 1.08 to 4.32. We propose a four-item disclosure standard for pluralistic alignment papers: subgroup composition disclosure, within-subgroup posterior reporting, identification gap quantification, and prior sensitivity to the assumed mixture. The standard makes the difference between "aggregate-fit" and "pluralism-faithful" empirically auditable.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 149
Loading