TL;DR: A novel 5000-layer Dynamic Transition Value Iteration Network performs well in extremely long-term large-scale planning tasks
Abstract: The Value Iteration Network (VIN) is an end-to-end differentiable neural network architecture for planning. It exhibits strong generalization to unseen domains by incorporating a differentiable planning module that operates on a latent Markov Decision Process (MDP). However, VINs struggle to scale to long-term and large-scale planning tasks, such as navigating a $100\times 100$ maze---a task that typically requires thousands of planning steps to solve. We observe that this deficiency is due to two issues: the representation capacity of the latent MDP and the planning module's depth. We address these by augmenting the latent MDP with a dynamic transition kernel, dramatically improving its representational capacity, and, to mitigate the vanishing gradient problem, introduce an "adaptive highway loss" that constructs skip connections to improve gradient flow. We evaluate our method on 2D/3D maze navigation environments, continuous control, and the real-world Lunar rover navigation task. We find that our new method, named Dynamic Transition VIN (DT-VIN), scales to 5000 layers and solves challenging versions of the above tasks. Altogether, we believe that DT-VIN represents a concrete step forward in performing long-term large-scale planning in complex environments.
Lay Summary: Planning is an essential skill for intelligent agents, enabling them to figure out how to reach goals in complex environments. A popular method, called Value Iteration Networks (VINs), allows artificial agents to plan by mimicking how humans and robots think ahead. However, VINs fail when the environment becomes large or the task requires many steps to complete. In this work, we propose an improved version of VIN, called Dynamic Transition VIN (DT-VIN). It introduces two key ideas: (1) a more flexible internal model that better captures the structure of the environment, and (2) a special training technique that helps extremely deep networks learn efficiently. These changes allow our model to plan across 5,000 steps—far more than previous methods. We test DT-VIN in a range of tasks, from simple maze navigation to controlling robots and planning routes for lunar rovers. Across all these tasks, DT-VIN consistently outperforms existing methods, showing that it is better at solving long, complicated planning problems. Our work brings AI one step closer to handling real-world challenges that involve complex, long-term decision-making.
Primary Area: Reinforcement Learning
Keywords: Value Iteration Networks, Long-term Planning, Reinforcement Learning, Deep Neural Network
Flagged For Ethics Review: true
Submission Number: 7842
Loading