The Sign Estimator: LLM Alignment in the Face of Choice Heterogeneity

Aymane El Gadarri; Ali Aouad; Vivek Farias

The Sign Estimator: LLM Alignment in the Face of Choice Heterogeneity

Aymane El Gadarri, Ali Aouad, Vivek Farias

20 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: AI Alignment, Reinforcement Learning From Human Feedback, Pluralistic Alignment, Preference Learning, Social choice theory

TL;DR: We present the first positive result on aggregating the expected utility of a heterogenous population in LLM alignment through a practical estimator with strong empirical performance and provable fast rates of convergence.

Abstract: Traditional LLM alignment methods are vulnerable to heterogeneity in human preferences. Fitting a naïve probabilistic model to pairwise comparison data (say over prompt-completion pairs) yields an inconsistent estimate of the population-average utility---a canonical measure of social welfare. We propose a new method, dubbed the sign estimator, that provides a simple, provably consistent, and efficient estimator by replacing cross-entropy with binary classification loss in the aggregation step. This simple modification recovers consistent ordinal alignment under mild assumptions, without requiring explicit modeling of user heterogeneity, and achieves the first polynomial finite-sample error bounds in this setting. Using standard benchmark experiments and a new empirical methodology to assess the impact of heterogeneity, we find that the sign estimator substantially reduces preference distortion compared to standard RLHF. Specifically, it cuts disagreement with true population preferences from 12\% to 8\%, and reduces angular estimation error by nearly 35\%. Our empirical set-up leverages digital twins---LLMs calibrated to real-world US panelists---to simulate realistic population-level heterogeneity and obtain a ground truth alignment target for evaluating different estimators.

Supplementary Material: pdf

Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning

Submission Number: 23560

Loading