Differentially Private No-regret Exploration in Adversarial Markov Decision Processes

Published: 26 Apr 2024, Last Modified: 15 Jul 2024UAI 2024 posterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Differential Privacy, Adversarial Markov Decision Proess, Regret Minimization
Abstract: We study learning adversarial Markov decision process (MDP) in the episodic setting under the constraint of differential privacy (DP). This is motivated by the widespread applications of reinforcement learning (RL) in non-stationary and even adversarial scenarios, where protecting users' sensitive information is vital. We first propose two efficient frameworks for adversarial MDPs, spanning full-information and bandit settings. Within each framework, we consider both Joint DP (JDP), where a central agent is trusted to protect the sensitive data, and Local DP (LDP), where the information is protected directly on the user side. Then, we design novel privacy mechanisms to privatize the stochastic transition and adversarial losses. By instantiating such privacy mechanisms to satisfy JDP and LDP requirements, we obtain near-optimal regret guarantees for both frameworks. To our knowledge, these are the first algorithms to tackle the challenge of private learning in adversarial MDPs.
List Of Authors: Shaojie, Bai and Lanting, Zeng and Chengcheng, Zhao and Xiaoming, Duan and Mohammad, Sadegh Talebi and Peng, Cheng and Jiming Chen
Latex Source Code: zip
Signed License Agreement: pdf
Submission Number: 512
Loading