Hybrid Policies Using Inverse Rewards for Reinforcement Learning

Sep 27, 2018 ICLR 2019 Conference Blind Submission readers: everyone Show Bibtex
  • Abstract: This paper puts forward a broad-spectrum improvement for reinforcement learning algorithms, which combines the policies using original rewards and inverse (negative) rewards. The policies using inverse rewards are competitive with the original policies, and help the original policies correct their mis-actions. We have proved the convergence of the inverse policies. The experiments for some games in OpenAI gym show that the hybrid polices based on deep Q-learning, double Q-learning, and on-policy actor-critic obtain the rewards up to 63.8%, 97.8%, and 54.7% more than the original algorithms. The improved polices are more stable than the original policies as well.
  • Keywords: Reinforcement Learning, Rewards
  • TL;DR: A broad-spectrum improvement for reinforcement learning algorithms, which combines the policies using original rewards and inverse (negative) rewards
0 Replies

Loading