Offline Reward Perturbation Boosts Distributional Shift in Online RL

Published: 26 Apr 2024, Last Modified: 15 Jul 2024UAI 2024 posterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: data poisoning attack, machine learning safety, offline to online reinforcement learning
Abstract: Offline-to-online reinforcement learning has recently been shown effective in reducing the online sample complexity by first training from offline collected data. However, this additional data source may also invite new poisoning attacks that target offline training. In this work, we reveal such vulnerabilities in $\textit{critic-regularized}$ offline RL by proposing a novel data poisoning attack method, which is stealthy in the sense that the performance during the offline training remains intact, but the online fine-tuning stage will suffer a significant performance drop. Our method leverages the techniques from bi-level optimization to promote the over-estimation/distribution shift under offline-to-online reinforcement learning. Experiments on four environments confirm the satisfaction of the new stealthiness requirement, and can be effective in attacking with only a small budget and without having white-box access to the victim model.
List Of Authors: Yu, Zishun and Kang, Siteng and Zhang, Xinhua
Latex Source Code: zip
Signed License Agreement: pdf
Submission Number: 758
Loading