Abstract: We, as humans, can quickly navigate to localize a target object, even in novel scenes. We argue that the ability is mainly due to the incorporation of a scene prior knowledge and visible object-relational reasoning. In this study, we propose a target-driven model by incorporating ground truth prior knowledge to navigate in a human-inspired way. We realize this approach by constructing a scene graph, which provides scene object spatial relationships through object relation detection. To imitate human reasonability, we propose to incorporate the prior with Markov model for predicting next best action and concatenate it into policy network which improves model generalization in seen and unseen scenes. Moreover, we perform experiments on the AI2THOR virtual environment [1] and outperform the current most state-of-the-art both in success rate and SPL (Success weighted by Path Length) on average.
0 Replies
Loading