Survival Instinct in Offline Reinforcement Learning and Implicit Human Bias in Data

Published: 20 Jun 2023, Last Modified: 11 Jul 2023ILHF Workshop ICML 2023EveryoneRevisions
Keywords: Offline RL, Safe RL, Implicit Data Bias
TL;DR: We find that offline RL can produce surprisingly good policies even when trained on wrong reward labels. We provide explanations and discuss practical implications.
Abstract: We present a novel observation about the behavior of offline reinforcement learning (RL) algorithms: on many benchmark datasets, offline RL can produce well-performing and safe policies even when trained with "wrong" reward labels, such as those that are zero everywhere or are negatives of the true rewards. This phenomenon cannot be easily explained by offline RL's return maximization objective. Moreover, it gives offline RL a degree of robustness that is uncharacteristic of its online RL counterparts, which are known to be sensitive to reward design. We demonstrate that this surprising robustness property is attributable to an interplay between the notion of *pessimism* in offline RL algorithms and a certain human bias implicit in common data collection practices. As we prove in this work, pessimism endows the agent with a *survival instinct*, i.e., an incentive to stay within the data support in the long term, while the limited and biased data coverage further constrains the set of survival policies. We argue that the survival instinct should be taken into account when interpreting results from existing offline RL benchmarks and when creating future ones. Our empirical and theoretical results suggest a new paradigm for RL, whereby an agent is "nudged" to learn a desirable behavior with imperfect reward but purposely biased data coverage.
Submission Number: 32
Loading