Guided Offline RL Using a Safety Expert
Abstract: Offline reinforcement learning is used to train policies in situations where it is expensive or infeasible to access
the environment during training. An agent trained under such a scenario does not get corrective feedback once
the learned policy starts diverging and may fall prey to the overestimation bias commonly seen in this setting.
This increases the chances of the agent choosing unsafe/risky actions, especially in states with sparse to no
representation in the training dataset. In this paper, we propose to leverage a safety expert to discourage the
offline RL agent from choosing unsafe actions in under-represented states in the dataset. The proposed framework
in this paper transfers the safety expert’s knowledge in an offline setting for states with high uncertainty to
prevent catastrophic failures from occurring in safety-critical domains. We use a simple but effective approach to
quantify the state uncertainty based on how frequently they appear in a training dataset. In states with high
uncertainty, the offline RL agent mimics the safety expert while maximizing the long-term reward. We modify
TD3+BC, an existing offline RL algorithm, as a part of the proposed approach. We demonstrate empirically that
our approach performs better than TD3+BC on some control tasks and comparably on others across two sets of
benchmark datasets while reducing the chance of taking unsafe actions in sparse regions of the state space.
Article: pdf
3 Replies
Loading