Guided Adaptive Credit Assignment for Sample Efficient Policy Optimization

Anonymous

Sep 25, 2019 ICLR 2020 Conference Blind Submission readers: everyone Show Bibtex
  • Keywords: credit assignment, sparse reward, policy optimization, sample efficiency
  • TL;DR: A new and general credit assignment method for obtaining sample efficiency of policy optimization in sparse reward setting
  • Abstract: Policy gradient methods have achieved remarkable successes in solving challenging reinforcement learning problems. However, it still often suffers from sparse reward tasks, which leads to poor sample efficiency during training. In this work, we propose a guided adaptive credit assignment method to do effectively credit assignment for policy gradient methods. Motivated by entropy regularized policy optimization, our method extends the previous credit assignment methods by introducing more general guided adaptive credit assignment(GACA). The benefit of GACA is a principled way of utilizing off-policy samples. The effectiveness of proposed algorithm is demonstrated on the challenging \textsc{WikiTableQuestions} and \textsc{WikiSQL} benchmarks and an instruction following environment. The task is generating action sequences or program sequences from natural language questions or instructions, where only final binary success-failure execution feedback is available. Empirical studies show that our method significantly improves the sample efficiency of the state-of-the-art policy optimization approaches.
0 Replies

Loading