Offline Reinforcement Learning for Customizable Visual Navigation

Dhruv Shah; Arjun Bhorkar; Hrishit Leen; Ilya Kostrikov; Nicholas Rhinehart; Sergey Levine

Offline Reinforcement Learning for Customizable Visual Navigation

Dhruv Shah, Arjun Bhorkar, Hrishit Leen, Ilya Kostrikov, Nicholas Rhinehart, Sergey Levine

08 Oct 2022 (modified: 05 May 2023)Deep RL Workshop 2022Readers: Everyone

Keywords: robotics, reinforcement learning

TL;DR: Offline RL doesn't scale well for long-horizon navigation but using values predicted with ORL within a topological graph framework can enable cool behavior on real robots!

Abstract: Robotic navigation often requires not only reaching a distant goal, but also satisfying intermediate user preferences on the path, such as obeying the rules of the road or preferring some surfaces over others. Our goal in this paper is to devise a robotic navigation system that can utilize previously collect data to learn navigational strategies that are responsive to user-specified utility functions, such as preferring specific surfaces or staying in sunlight (e.g., to maintain solar power). To this end, we show how offline reinforcement learning can be used to learn reward-specific value functions for long-horizon navigation that can then be composed with planning methods to reach distant goals, while still remaining responsive to user-specified navigational preferences. This approach can utilize large amounts of previously collected data, which is relabeled with the task reward. This makes it possible to incorporate diverse data sources and enable effective generalization in the real world, without any simulation, task-specific data collection, or demonstrations. We evaluate our system, ReViND, using a large navigational dataset from prior work, without any data collection specifically for the reward functions that we test. We demonstrate that our system can control a real-world ground robot to navigate to distant goals using only offline training from this dataset, and exhibit behaviors that qualitatively differ based on the user-specified reward function.

0 Replies

Loading