Resolving Causal Confusion in Reinforcement Learning via Robust ExplorationDownload PDF

Mar 09, 2021 (edited Apr 15, 2021)ICLR 2021 Workshop SSL-RL Blind SubmissionReaders: Everyone
  • Abstract: A reinforcement learning agent must distinguish between spurious correlations and causal relationships in its environment in order to robustly achieve its goals. Contrary to popular belief, such cases of causal confusion {\em can} occur in online reinforcement learning (RL) settings. We demonstrate this, and show how causal confusion can lead to catastrophic failure under even mild forms of distribution shift. We formalize the problem of identifying causal structure in a Markov Decision Process, and highlight the central role played by the data collection policy in identifying and avoiding spurious correlations. We find that under insufficient exploration, many RL algorithms, including those with PAC-MDP guarantees, fall prey to causal confusion under insufficient exploration policies. To address this, we present a robust exploration strategy which enables causal hypothesis-testing by interaction with the environment. Our method outperforms existing state-of-the-art approaches at avoiding causal confusion, improving robustness and generalization in a range of tasks.
0 Replies

Loading