Abstract: We present Value Propagation (VProp), a parameter-efficient differentiable planning module built on Value Iteration which can successfully be trained in a reinforcement learning fashion to solve unseen tasks, has the capability to generalize to larger map sizes, and can learn to navigate in dynamic environments. We evaluate on configurations of MazeBase grid-worlds, with randomly generated environments of several different sizes. Furthermore, we show that the module enables to learn to plan when the environment also includes stochastic elements, providing a cost-efficient learning system to build low-level size-invariant planners for a variety of interactive navigation problems.
TL;DR: We propose Value Propagation, a novel end-to-end planner which can learn to solve 2D navigation tasks via Reinforcement Learning, and that generalizes to larger and dynamic environments.
Keywords: Learning to plan, Reinforcement Learning, Value Iteration, Navigation, Convnets