Binding Touch to Everything: Learning Unified Multimodal Tactile Representations

Published: 01 Jan 2024, Last Modified: 13 May 2025CVPR 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Touch provides crucial information about the physical properties of the objects around us. Creating models that capture cross-modal associations between touch and other modalities, however, remains a challenging problem, due to wide variety of touch sensors and the intensive effort required to collect tactile data. We propose UniTouch, a unified model for vision-based touch sensors that connects their tactile signals to other modalities, including vision, language, and sound. We achieve this by aligning our tactile embeddings to pretrained image embeddings already associated with a variety of other modalities. We further propose learnable sensor-specific tokens, allowing the model to learn from a set of heterogeneous tactile sensors, all at the same time. UniTouch is capable of conducting various touch sensing tasks in a zero-shot setting, from robot grasping prediction to touch-based question answering. To the best of our knowledge, UniTouch is the first model to demonstrate these capabilities. Project Page: https://cfeng16.github.io/UniTouch/.
Loading