Tactile Beyond Pixels: Multisensory Touch Representations for Robot Manipulation

Published: 08 Aug 2025, Last Modified: 23 Sept 2025CoRL 2025 OralEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Multi-sensory Touch, Self-Supervised Learning, Tactile Adaptation
TL;DR: TacX, the first self-supervised general-purpose multisensory touch representations across four key modalities: image, audio, inertial measurements (IMU), and pressure.
Abstract: We present TacX, the first multisensory touch representations across four tactile modalities: image, audio, motion, and pressure. Trained on ~1M contact-rich interactions collected with the Digit 360 sensor, TacX captures complementary touch signals at diverse temporal and spatial scales. By leveraging self-supervised learning, TacX fuses these modalities into a unified representation that captures physical properties useful for downstream robot manipulation tasks. We study how to effectively integrate real-world touch representations for both imitation learning and tactile adaptation of sim-trained policies, showing that TacX boosts policy success rates by 63% over an end-to-end model using tactile images and improves robustness by 90% in recovering object states from touch. Finally, we benchmark TacX’s ability to make inference about physical properties, such as object-action identification, material-quantity estimation and force estimation. TacX improves accuracy in characterizing physical properties by 48% compared to end-to-end approaches, demonstrating the advantages of multisensory pretraining for capturing features essential for dexterous manipulation.
Supplementary Material: zip
Submission Number: 661
Loading