Differentially Private Reward Estimation from Preference Based Feedback

Published: 29 Jun 2023, Last Modified: 04 Oct 2023MFPL PosterEveryoneRevisionsBibTeX
Keywords: differential privacy, estimation, preference feedback, reward learning
TL;DR: We present the first results on parameter estimation from preference-based feedback under label DP constraints
Abstract: Preference-based reinforcement learning (RL) has gained attention as a promising approach to align learning algorithms with human interests in various domains. Instead of relying on numerical rewards, preference-based RL uses feedback from human labelers in the form of pairwise or $K$-wise comparisons between actions. In this paper, we focus on reward learning in preference-based RL and address the issue of estimating unknown parameters while protecting privacy. We propose two estimators based on the Randomized Response strategy that ensure label differential privacy. The first estimator utilizes maximum likelihood estimation (MLE), while the second estimator employs stochastic gradient descent (SGD). We demonstrate that both estimators achieve an estimation error of $\widetilde O(1/\sqrt{n})$ with $n$ number of samples. The additional cost of ensuring privacy for human labelers is proportional to $\frac{e^\epsilon +1 }{e^\epsilon -1}$ in the best case, where $\epsilon>0$ is the privacy
Submission Number: 48
Loading