Hybrid Policies Using Inverse Rewards for Reinforcement Learning

Yao Shi; Tian Xia; Guanjun Zhao; Xin Gao

Hybrid Policies Using Inverse Rewards for Reinforcement Learning

Yao Shi, Tian Xia, Guanjun Zhao, Xin Gao

27 Sept 2018 (modified: 05 May 2023)ICLR 2019 Conference Blind SubmissionReaders: Everyone

Abstract: This paper puts forward a broad-spectrum improvement for reinforcement learning algorithms, which combines the policies using original rewards and inverse (negative) rewards. The policies using inverse rewards are competitive with the original policies, and help the original policies correct their mis-actions. We have proved the convergence of the inverse policies. The experiments for some games in OpenAI gym show that the hybrid polices based on deep Q-learning, double Q-learning, and on-policy actor-critic obtain the rewards up to 63.8%, 97.8%, and 54.7% more than the original algorithms. The improved polices are more stable than the original policies as well.

Keywords: Reinforcement Learning, Rewards

TL;DR: A broad-spectrum improvement for reinforcement learning algorithms, which combines the policies using original rewards and inverse (negative) rewards

4 Replies

Loading