Abstract: Pommerman is a recently-proposed multi-agent benchmark, which is very challenging for Reinforcement Learning (RL). The main obstacles to adopt RL in Pommerman are the delayed action effects and sparse rewards. This paper presents novel approaches to mitigate these problems by introducing Artificial Potential Field (APF) in the two-dimensional Pommerman world. We propose a new framework to generate hybrid features from both APF computation and raw environment data. Meanwhile, a new reward shaping method through APF is developed to give the learning agent faster and more efficient policy iteration. The training results show that the learning speed and convergence reward are both improved on a 1v1 mode of the Pommerman game, compared to the conventional learning algorithms, A2C and ACKTR.
0 Replies
Loading