UniT: Data Efficient Tactile Representation With Generalization to Unseen Objects

Zhengtong Xu, Raghava Uppuluri, Xinwei Zhang, Cael Fitch, Philip G. Crandall, Wan Shou, Dongyi Wang, Yu She

Published: 2025, Last Modified: 29 Apr 2026IEEE Robotics Autom. Lett. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: UniT is an approach to tactile representation learning, using VQGAN to learn a compact latent space and serve as the tactile representation. It uses tactile images obtained from a single simple object to train the representation with generalizability. This tactile representation can be zero-shot transferred to various downstream tasks, including perception tasks and manipulation policy learning. Our benchmarkings on in-hand 3D pose and 6D pose estimation tasks and a tactile classification task show that UniT outperforms existing visual and tactile representation learning methods. Additionally, UniT's effectiveness in policy learning is demonstrated across three real-world tasks involving diverse manipulated objects and complex robot-object-environment interactions. Through extensive experimentation, UniT is shown to be a simple-to-train, plug-and-play, yet widely effective method for tactile representation learning.

External IDs:dblp:journals/ral/XuUZFCSWS25