Mapping Social Choice Theory to RLHF

Published: 05 Mar 2024, Last Modified: 08 May 2024ICLR 2024 R2-FM Workshop PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: social choice, rlhf
TL;DR: How can we bridge the gap between social choice theory and RLHF?
Abstract: Recent work on the limitations of using reinforcement learning from human feedback (RLHF) to incorporate human preferences into model behavior often raises social choice theory as a reference point. Social choice theory’s analysis of settings such as voting mechanisms provides technical infrastructure that can inform how to aggregate human preferences amid disagreement. We analyze the problem settings of social choice and RLHF, and identify differences between them that prevent well-known technical results in social choice from immediately applying to RLHF. We then redefine canonical desiderata from social choice theory for the RLHF context and discuss how they may serve as analytical tools for open problems in RLHF. Finally, we contextualize the role of social choice in the broader political theory literature on democracy and collective decision making.
Submission Number: 74
Loading