Keywords: Reinforcement Learning, Off-policy Evaluation, Adversarial Attacks, Data Poisoning Attack, Offline RL
Abstract: Off-policy Evaluation (OPE) methods are crucial for evaluating policies in high-stakes domains such as healthcare, where exploration is often infeasible or expensive. However, the extent to which such methods can be trusted under adversarial threats to data quality is largely unexplored. In this work, we make the first attempt at investigating the sensitivity of OPE methods to adversarial perturbations to the data. We design a data poisoning attack framework that leverages influence functions to construct perturbations that maximize error in the policy value estimates. Our experimental results show that many OPE methods are highly prone to data poisoning attacks, even for small adversarial perturbations.