Neural All-Pairs Shortest Path for Reinforcement LearningDownload PDF

17 Oct 2022, 19:25 (modified: 09 Dec 2022, 14:31)Deep RL Workshop 2022Readers: Everyone
Abstract: Having an informative and dense reward function is an important requirement to efficiently solve goal-reaching tasks. While the natural reward for such tasks is a binary signal indicating success or failure, providing only a binary reward makes learning very challenging given the sparsity of the feedback. Hence, introducing dense rewards helps to provide smooth gradients. However, these functions are not readily available, and constructing them is difficult, as it often requires a lot of time and domain-specific knowledge, and can unintentionally create spurious local minima. We propose a method that learns neural all-pairs shortest paths, used as a distance function to learn a policy for goal-reaching tasks, requiring zero domain-specific knowledge. In particular, our approach includes both a self-supervised signal from the temporal distance between state pairs of an episode, and a metric-based regularizer that leverages the triangle inequality for an additional connectivity information between state triples. This dynamical distance can be either used as a cost function, or reshaped as a reward, and, differently from previous work, is fully self-supervised, compatible with off-policy learning and robust to local minima.
Supplementary Material: zip
0 Replies

Loading