Abstract: Deep Reinforcement Learning (DRL) has achieved expressive success across a wide range of domains. However, it is still faced with the sample-inefficiency problem that requires massive training samples to learn the optimal policy. Furthermore, the trained policy is highly dependent on the training environment which limits the generalization. In this paper, we propose the Planner-guided RL (PRL) approach to explore how symbolic planning can help DRL in terms of efficiency and generalization. Our PRL is a two-level structure that incorporates any symbolic planner as the meta-controller to derive the subgoals. The low-level controller learns how to achieve the subgoals. We evaluate PRL on Montezuma's Revenge and results show that PRL outperforms previous hierarchical methods. The evaluation of generalization is a work in progress
Submission Number: 71
Loading