Abstract: Despite great success in recent years, deep reinforcement learning architectures still face a tremendous challenge in dealing with uncertainty and perceptual ambiguity. Similarly, networks that learn to build the world model from the input and perform model-based decision making in novel environments (e.g., value iteration networks) are mostly limited to fully observable tasks. In this paper, we propose a new planning module architecture, the VI$^2$N (Value Iteration with Value of Information Network), that learns to act in novel environments with a high amount of perceptual ambiguity. This architecture over-emphasizes reducing the uncertainty before exploiting the reward. Our network outperforms other deep architecture in challenging partially observable environments. Moreover, it generates interpretable cognitive maps highlighting both rewarding and informative locations. The similarity of principles and computations of our network with observed cognitive processes and neural activity in the Hippocampus draw a strong connection between VI$^2$N and principles of computations in the biological networks.
Submission Number: 27
Loading