Keywords: machine learning, reinforcement learning, privacy, differential privacy, deep learning, model-based, offline
TL;DR: We address deep offline reinforcement learning with differential privacy guarantees, using a model-based approach.
Abstract: We address deep offline reinforcement learning with privacy guarantees, where the goal is to train a policy that is differentially private with respect to individual trajectories in the dataset. To achieve this, we introduce DP-MORL, an MBRL algorithm with differential privacy guarantees. A private model of the environment is first learned from offline data using DP-FedAvg, a training method for neural networks that provides differential privacy guarantees at the trajectory level. Then, we use model-based policy optimization to derive a policy from the (penalized) private model, without any further interaction with the system or access to the dataset. We empirically show that DP-MORL enables the training of private RL agents from offline data in continuous control tasks and we furthermore outline the price of privacy in this setting.
Submission Number: 77
Loading