Open Peer Review. Open Publishing. Open Access. Open Discussion. Open Directory. Open Recommendations. Open API. Open Source.
Value Propagation Networks
Nov 03, 2017 (modified: Nov 03, 2017)ICLR 2018 Conference Blind Submissionreaders: everyoneShow Bibtex
Abstract:We present Value Propagation (VProp), a parameter-efficient differentiable planning module built on Value Iteration which can successfully be trained in a reinforcement learning fashion to solve unseen tasks, has the capability to generalize to larger map sizes, and can learn to navigate in dynamic environments. We evaluate on configurations of MazeBase grid-worlds, with randomly generated environments of several different sizes. Furthermore, we show that the module and its variants provide a simple way to learn to plan when adversarial agents are present and the environment is stochastic, providing a cost-efficient learning system to build low-level size-invariant planners for a variety of interactive navigation problems.
TL;DR:We propose Value Propagation, a novel end-to-end planner which can learn to solve 2D navigation tasks via Reinforcement Learning, and that generalize to larger and dynamic environments.
Keywords:Learning to plan, Reinforcement Learning, Value Iteration, Navigation, Convnets
Enter your feedback below and we'll get back to you as soon as possible.