Win Rate is All that Can Matter from Preference Data Alone

Lily H Zhang; Rajesh Ranganath

Win Rate is All that Can Matter from Preference Data Alone

Lily H Zhang, Rajesh Ranganath

27 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: alignment, preference learning, RLHF, win rate, language model

TL;DR: We provide a win-rate centric framework to unify disparate methods in preference learning.

Abstract: The surging interest in learning from preference data has resulted in an elaborate landscape of methods and evaluations. This work offers a framework to simplify this landscape, starting from the underlying sampling distribution for preference data. First, we show that the only evaluation of a generative model that is grounded in the preference data sampling distribution is win rate. Given that win rate is all that can matter from preference data alone, we relate common preference learning algorithms to direct win rate optimization (DWRO). We outline the theoretical benefits of RLHF as a variant of DWRO; explain why checkpointing is difficult with DPO as a non-DWRO objective; and characterize the limits of SFT on preferred samples with regard to the extent of win rate improvement possible. Furthermore, we provide closed-form expressions for the expected win rate improvement of the above objectives, formalizing the role of a model's starting point in the win rate improvement possible. Finally, we conduct an empirical analysis of existing methods and alternative DWRO objectives which suggests that optimization improvements are likely key to advancing preference learning.

Primary Area: alignment, fairness, safety, privacy, and societal considerations

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 12482

Loading