Keywords: reinforcement learning, learning to plan, navigation
Abstract: Sparse rewards and long decision horizons make agent navigation tasks difficult to solve via reinforcement learning (RL) such as (deep) Q-learning. Previous work has shown that some of these tasks are efficiently solvable by value-based planning in a state space abstraction, which defines sub-goals for a policy in the original state space. However, value-based planning scales poorly with the number of state space dimensions. Consequently, the planning might not be able to consider all the state information, like other agents' behaviors. Combining the benefits of planning and learning values, we propose the Value Refinement Network (VRN), an architecture that locally refines a plan in a (simpler) state space abstraction, represented by a pre-computed value function, with respect to the full agent state. Training the VRN via RL, it can learn how to correct this initial plan effectively to solve tasks that otherwise would require a prohibitively large abstraction. Evaluating on several simulated agent navigation tasks, we demonstrate the benefits of our VRN: We show that it can successfully refine shortest path plans to match the performance of value iteration in a more complex state space. Furthermore, in vehicle parking tasks where considering all relevant state space dimensions in planning is infeasible, the VRN still enables high task completion rates.
One-sentence Summary: The Value Refinement Network (VRN) is an architecture that refines a simple plan locally with respect to the full agent state.