everyone
since 13 Oct 2023">EveryoneRevisionsBibTeX
We study the privacy of reinforcement learning from human feedback. In particular, we focus on solving the problem of reinforcement learning from preference rankings, subject to the constraint of differential privacy, in MDPs where true rewards are given by linear functions. To achieve this, we analyze $(\epsilon,\delta)$-differential privacy (DP) for both the Bradley-Terry-Luce (BTL) model and the Plackett-Luce (PL) model. We provide a differentially private algorithm for learning rewards from human rankings. We further show that the privately learned rewards can be used to train policies achieving statistical performance guarantees that asymptotically match the best known algorithms in the non-private setting, which are in some cases minimax optimal.