Robust AI Evaluation through Maximal Lotteries
Track: Main Papers Track (6 to 9 pages)
Keywords: Social Choice Theory, AI Evaluation, Robust Optimization
TL;DR: We introduce robust lotteries to aggregate heterogeneous pairwise preferences into reliable model evaluations.
Abstract: The standard way to evaluate language models on subjective tasks is through pairwise comparisons: an annotator chooses the "better" of two model responses for a given prompt. These comparisons are then aggregated into a single ranking via the Bradley–Terry (BT) framework, forcing heterogeneous preferences into a total order and violating basic social-choice desiderata. In contrast, social choice theory provides an alternative approach called maximal lotteries, which aggregates pairwise preferences without imposing any assumptions on their structure. However, we show that maximal lotteries can be highly sensitive to heterogeneity among annotators and across prompts. We introduce *robust lotteries*, which optimize worst-case performance under plausible shifts in the preference data. On large-scale preference datasets, robust lotteries achieve more reliable win rate guarantees across the annotator distribution and recover a stable set of top performing models.
Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.
Submission Number: 29
Loading