Estimating and Penalizing Induced Preference Shifts in Recommender Systems

Micah Carroll; Dylan Hadfield-Menell; Stuart Russell; Anca Dragan

Estimating and Penalizing Induced Preference Shifts in Recommender Systems

Micah Carroll, Dylan Hadfield-Menell, Stuart Russell, Anca Dragan

Published: 28 Jan 2022, Last Modified: 13 Feb 2023ICLR 2022 SubmittedReaders: Everyone

Keywords: recommender systems, preference shift, preference estimation, preference tampering

Abstract: The actions that a recommender system (RS) takes -- the content it exposes users to -- influence the preferences users have over what content they want. Therefore, when an RS designer chooses which system to deploy, they are implicitly \emph{choosing how to shift} or influence user preferences. Even more, if the RS is trained via long-horizon optimization (e.g. reinforcement learning), it will have incentives to manipulate preferences, i.e to shift them so they are more easy to satisfy, and thus conducive to higher reward. While some work has argued for making systems myopic to avoid this issue, myopic systems can still influence preferences in undesirable ways. In this work, we argue that we need to enable system designers to \textit{estimate} the shifts an RS \emph{would} induce; \textit{evaluate}, before deployment, whether the shifts are undesirable; and even \textit{actively optimize} to avoid such shifts. These steps involve two challenging ingredients: \emph{estimation} requires the ability to anticipate how hypothetical policies would influence user preferences if deployed -- we do this by training a user predictive model that implicitly contains their preference dynamics from historical user interaction data; \emph{evaluation} and \emph{optimization} additionally require metrics to assess whether such influences are manipulative or otherwise unwanted -- we introduce the notion of “safe shifts”, that define a trust region within which behavior is believed to be safe. We show that recommender systems that optimize for staying in the trust region can avoid manipulative behaviors (e.g., changing preferences in ways that make users more predictable), while still generating engagement.

13 Replies

Loading