Investigating Representations for Vision And Touch in Contact Rich Robot Scooping Tasks

Published: 19 Mar 2024, Last Modified: 01 Jun 2024Tiny Papers @ ICLR 2024 ArchiveEveryoneRevisionsBibTeXCC BY 4.0
Keywords: representation learning, robot scooping, tactile, self-supervised learning
TL;DR: Extending existing method to more modalities and investigating what's the best approach and architecture on a specific task
Abstract: Contact-rich robotic manipulation in unstructured environments remains an open challenge in robotics with no established universal architectures or representations to handle the involved modalities. This paper analyzes different approaches for combining vision and touch to improve robotic scooping, using an open-source scooping dataset. We compare different architectures and modalities and analyze the their impact on in-distribution and out-of-distribution performance. We find that the best performing model on in-distribution terrains is the one which uses both vision and touch data and is trained end-to-end. However, the best performing model on out-of-distribution terrains is the one that uses only vision data. Future work should explore larger, diverse datasets and other self-supervised methods.
Submission Number: 254
Loading