Value Propagation Networks


Nov 03, 2017 (modified: Nov 03, 2017) ICLR 2018 Conference Blind Submission readers: everyone Show Bibtex
  • Abstract: We present Value Propagation (VProp), a parameter-efficient differentiable planning module built on Value Iteration which can successfully be trained in a reinforcement learning fashion to solve unseen tasks, has the capability to generalize to larger map sizes, and can learn to navigate in dynamic environments. We evaluate on configurations of MazeBase grid-worlds, with randomly generated environments of several different sizes. Furthermore, we show that the module and its variants provide a simple way to learn to plan when adversarial agents are present and the environment is stochastic, providing a cost-efficient learning system to build low-level size-invariant planners for a variety of interactive navigation problems.
  • TL;DR: We propose Value Propagation, a novel end-to-end planner which can learn to solve 2D navigation tasks via Reinforcement Learning, and that generalize to larger and dynamic environments.
  • Keywords: Learning to plan, Reinforcement Learning, Value Iteration, Navigation, Convnets