Keywords: Value Iteration Networks, POMDP, decesion making under ucnertainty
Abstract: Despite of great success in the recent years, deep reinforcement learning architectures still face a tremendous challenge in many real-world scenarios due to perceptual ambiguity. Similarly, differentiable networks, known as value iteration networks, that performs well in novel situations by extracting the environment model from training setups, are mostly limited to fully observable tasks. In this paper, we propose a new architecture, the VI$^2$N (Value Iteration with Value of Information Network) that can learn to act in novel environments with high amount of uncertainty. Specifically, this architecture uses a heuristic that over-emphasizes on reducing the uncertainty before exploiting the reward. Our network outperforms the state of the art differentiable architecture for partially observable environments especially when long term planning is needed to resolve the uncertainty.