Model-Free Counterfactual Credit AssignmentDownload PDF

Sep 28, 2020 (edited Mar 05, 2021)ICLR 2021 Conference Blind SubmissionReaders: Everyone
  • Reviewed Version (pdf):
  • Keywords: credit assignment, model-free RL, causality, hindsight
  • Abstract: Credit assignment in reinforcement learning is the problem of measuring an action’s influence on future rewards. In particular, this requires separating \emph{skill} from \emph{luck}, ie.\ disentangling the effect of an action on rewards from that of external factors and subsequent actions. To achieve this, we adapt the notion of counterfactuals from causality theory to a model-free RL setup. The key idea is to condition value functions on \emph{future} events, by learning to extract relevant information from a trajectory. We then propose to use these as future-conditional baselines and critics in policy gradient algorithms and we develop a valid, practical variant with provably lower variance, while achieving unbiasedness by constraining the hindsight information not to contain information about the agent’s actions. We demonstrate the efficacy and validity of our algorithm on a number of illustrative problems.
  • One-sentence Summary: Under an appropriate action-independence constraint, future-conditional baselines are valid to use in policy gradients and lead to drastically reduced variance and faster learning in certain environments with difficult credit assignment.
  • Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
19 Replies