Keywords: Reinforcement Learning, value iteration networks, planning under uncertainty
TL;DR: We proposed a novel deep architecture for decision making under uncertainty based on planning for reward maximization and information gathering.
Abstract: Deep neural networks that incorporate classic reinforcement learning methods, such as value iteration, into their structure significantly outperform randomly structured networks in learning and generalization. These networks, however, are mostly limited to environments with no or very low uncertainty and do not extend well to partially observable environments. In this paper, we propose a new planning module architecture, the VI$^2$N (Value Iteration with Value of Information Network), that learns to act in novel environments with high perceptual ambiguity. This architecture over-emphasizes reducing uncertainty before exploiting the reward. VI$^2$N can also utilize factorization in environments with mixed observability to decrease the computational complexity of calculating the policy and to facilitate learning. Tested on a range of grid-based navigation tasks, each containing various types of environments with different degrees of observability, our network outperforms other deep architectures. Moreover, VI$^2$N generates interpretable cognitive maps highlighting both rewarding and informative locations. These maps highlight the key states the agent must visit to achieve its goal.
Primary Area: reinforcement learning
Submission Number: 13955
Loading