ViSaRL: Visual Reinforcement Learning Guided by Human SaliencyDownload PDF

Published: 07 May 2023, Last Modified: 15 May 2023ICRA-23 Workshop on Pretraining4Robotics SpotlightReaders: Everyone
Keywords: Visual Reinforcement Learning, Visual Representation Learning, Human Saliency Maps, Multimodal Learning
TL;DR: We propose a framework for utilizing weakly supervised, human-annotated saliency maps in visual RL tasks and demonstrate that it consistently improves success rate in various robot manipulation tasks from the Meta-World benchmark.
Abstract: Training autonomous agents to perform complex control tasks from high-dimensional pixel input using reinforcement learning (RL) is challenging and sample-inefficient. When performing a task, people visually attend to task-relevant objects and areas. By contrast, pixel observations in visual RL are comprised primarily of task-irrelevant information. To bridge that gap, we introduce Visual Saliency-Guided Reinforcement Learning (ViSaRL). Using ViSaRL to learn visual scene encodings improves the success rate of an RL agent on four challenging visual robot control tasks in the Meta-World benchmark. This finding holds across two different visual encoder backbone architectures, with average success rate absolute gains of 13% and 18% with CNN and Transformer-based visual encoders, respectively. The Transformer-based visual encoder can achieve a 10% absolute gain in success rate even when saliency is only available during pretraining.
0 Replies

Loading