Abstract: Our motivation is to scale value iteration to larger environments without a huge increase in computational demand, and fix the problems inherent to Value Iteration Networks (VIN) such as spatial invariance and unstable optimization. We show that VINs, and even extended VINs which improve some of their shortcomings, are empirically difficult to optimize, exhibiting instability during training and sensitivity to random seeds. Furthermore, we explore whether the inductive biases utilized in past differentiable path planning modules are even necessary, and demonstrate that the requirement that the architectures strictly resemble path-finding algorithms does not hold. We do this by designing a new path planning architecture called the LSTM-Iteration Network, which achieves better performance than VINs in metrics such as success rate, training stability, and sensitivity to random seeds.
Keywords: deep reinforcement learning, path planning
TL;DR: We introduce a new path planning architecture called the LSTM-Iteration Network, which achieves better performance than Value Iteration Networks in metrics such as success rate, training stability, and sensitivity to random seeds.
4 Replies
Loading