Separating Value Disagreement from Data Uncertainty in Pluralistic Preference Data

Published: 02 Jun 2026, Last Modified: 11 Jun 2026Pluralistic-Alignment 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: pluralistic alignment, aleatoric uncertainty, epistemic uncertainty, credal sets, preference learning
TL;DR: A credal disjoint-head model separates value disagreement from data uncertainty in pluralistic preference data, supporting a per-item routing rule between 'collect more annotators' and 'preserve disagreement.'
Abstract: Pluralistic preference data entangles two operationally distinct phenomena: genuine value disagreement that should be preserved as a multi-modal label, and under-sampled items that need more annotation. Standard ensemble-uncertainty estimators conflate the two, treating disagreement as a single signal. We propose a credal disjoint-head model that learns the population-mean preference and a preference-dispersion proxy on separate gradient paths, encouraging a structural separation. Our robust finding is decorrelation: on a synthetic generator with closed-form ground truth and a HelpSteer3 disagreement subset, the two estimators stay near-independent where the baseline holds them tightly coupled. Recovering the ground-truth epistemic ranking is a secondary result—clear in the data-rich regime, modest on average—so we foreground decorrelation over recovery. The decomposition supports a candidate per-item routing rule between "collect more annotators" and "preserve disagreement". A pilot held-out annotator simulation shows the rule routes in the predicted direction.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 119
Loading