Building Generalizable Agents with a Realistic and Rich 3D Environment


Nov 07, 2017 (modified: Nov 07, 2017) ICLR 2018 Conference Blind Submission readers: everyone Show Bibtex
  • Abstract: Towards bridging the gap between machine and human intelligence, it is of utmost importance to introduce environments that are visually realistic and rich in content. In this work, we build House3D, a rich, extensible and interactive environment that contains over 45,000 human-designed 3D scenes of houses, ranging from single-room studios to multi-storeyed houses, equipped with a diverse set of fully labeled 3D objects, textures and scene layouts, based on SUNCG dataset. Using a subset of houses in this environment, we study the task of RoomNav in which an agent navigates towards a target specified by a high-level instruction. For this, the agent learns to comprehend the scene it lives in by developing perception, understand the instruction by mapping it to the correct semantics, and navigate to the target by obeying the underlying physical laws. We tackle this problem by training RL agents with gated-attention networks and show that our trained agent succeeds in new unseen environments, showing generalization capability. In particular, we observe that (1) training is substantially harder on large house sets but with better generalization, (2) using semantic signals (e.g., segmentation mask) boosts the generalization performance, and (3) gating mechanism helps more about the training but less about the generalization. We hope House3D, as well as the analysis of RoomNav task, acts as a step towards building a practical intelligent system and can potentially benefit the community.
  • Keywords: reinforcement learning, navigation, generalization, 3D scenes