Abstract: Despite the recent success of deep reinforcement learning (RL), the generalization ability of RL agents remains an open problem for real-world applicability. RL agents trained on pixels may completely be derailed from achieving their objectives in unseen situations with different levels of visual changes. However, numerous existing RL suites do not address this as a primary objective or lack consistent level design of increased complexity. In this paper, we introduce the LevDoom benchmark, a suite containing semi-realistic 3D simulation environments with coherent levels of difficulty in the renowned video game Doom, designed to benchmark generalization in vision-based RL. We demonstrate how our benchmark reveals weaknesses of some popular Deep RL algorithms, which fail to prevail in modified environments. We further establish how our difficulty level design presents increasing complexity to these algorithms.
0 Replies
Loading