Robust Manipulation with Spatial FeaturesDownload PDF

Published: 17 Nov 2022, Last Modified: 05 May 2023PRL 2022 PosterReaders: Everyone
Abstract: Our goal is to develop visual pre-training strategies that enable more robust and efficient manipulation policy learning. We find that a Vision Transformer trained with a distillation loss that biases representations towards shape exhibits strong zero-shot transfer performance on the kitchen shift suite, even when compared to baselines trained on larger and more task-relevant datasets. When finetuned, the attention heads of a transformer trained with a shape bias can be visualized as a spatial feature map, which emergently segments manipulation-relevant objects in an image. By leveraging each of these insights, we are able to improve the average zero-shot performance of policies trained on the sliding door task within the FrankaKitchen environment by nearly 2x compared to the next best method. Additionally, we are able to improve maximum success in distribution by 13\% by masking out attention heads that attend to distractors.
1 Reply

Loading