Multi-level Certified Defense Against Poisoning Attacks in Offline Reinforcement Learning

Shijie Liu; Andrew Craig Cullen; Paul Montague; Sarah Monazam Erfani; Benjamin I. P. Rubinstein

Multi-level Certified Defense Against Poisoning Attacks in Offline Reinforcement Learning

Shijie Liu, Andrew Craig Cullen, Paul Montague, Sarah Monazam Erfani, Benjamin I. P. Rubinstein

Published: 22 Jan 2025, Last Modified: 28 Feb 2025ICLR 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Adversarial Machine Learning, Certified Robustness, Reinforcement Learning, Poisoning Attack

TL;DR: We strengthen offline RL's certified defense against poisoning attacks in more general RL settings via Differential Privacy, achieving much greater robustness compared to previous methods.

Abstract: Similar to other machine learning frameworks, Offline Reinforcement Learning (RL) is shown to be vulnerable to poisoning attacks, due to its reliance on externally sourced datasets, a vulnerability that is exacerbated by its sequential nature. To mitigate the risks posed by RL poisoning, we extend certified defenses to provide larger guarantees against adversarial manipulation, ensuring robustness for both per-state actions, and the overall expected cumulative reward. Our approach leverages properties of Differential Privacy, in a manner that allows this work to span both continuous and discrete spaces, as well as stochastic and deterministic environments---significantly expanding the scope and applicability of achievable guarantees. Empirical evaluations demonstrate that our approach ensures the performance drops to no more than 50% with up to 7% of the training data poisoned, significantly improving over the 0.008% in prior work (Wu et al., 2022), while producing certified radii that is 5 times larger as well. This highlights the potential of our framework to enhance safety and reliability in offline RL.

Primary Area: alignment, fairness, safety, privacy, and societal considerations

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 3473

Loading