Abstract: This work considers the problem of deep reinforcement learning (RL) with long time dependencies and sparse rewards, as are found in many hard exploration games. A graph-based representation is proposed to allow an agent to perform self-navigation for environmental exploration. The graph representation not only effectively models the environment structure, but also efficiently traces the agent state changes and the corresponding actions. By encouraging the agent to earn a new influence-based curiosity reward for new game observations, the whole exploration task is divided into sub-tasks, which are effectively solved using a unified deep RL model. Experimental evaluations on hard exploration Atari Games demonstrate the effectiveness of the proposed method. The source code and learned models will be released to facilitate further studies on this problem.
0 Replies
Loading