SSVTP: Self-Supervised Visuo-Tactile Pretraining to contact deformation representation learning via multi-sensor

Jiangyu Hu

SSVTP: Self-Supervised Visuo-Tactile Pretraining to contact deformation representation learning via multi-sensor

Jiangyu Hu

11 Aug 2025 (modified: 01 Sept 2025)IEEE IROS 2025 Workshop Tactile Sensing SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Contact-Rich Manipulation

Abstract: In most contact-rich manipulation tasks, humans apply time-varying forces to the target object, compensating for inaccuracies in the vision-guided hand trajectory. How ever, current robot learning algorithms primarily focus on trajectory-based policy, with limited attention given to learning force-related skills. To address this limitation, we introduce ForceMimic, a force-centric robot learning system, providing a natural, force-aware and robot-free robotic demonstration collection system, along with a hybrid force-motion imitation learning algorithm for robust contact-rich manipulation. Using the proposed ForceCapture system, an operator can peel a zucchini in 5 minutes, while force-feedback teleoperation takes over 13 minutes and struggles with task completion. With the collected data, we propose HybridIL to train a force-centric imitation learning model, equipped with hybrid force-position control primitive to fit the predicted wrench-position param eters during robot execution. Experiments demonstrate that our approach enables the model to learn a more robust policy under the contact-rich task of vegetable peeling, increasing the success rates by 54.5% relatively compared to state-of-the-art pure-vision-based imitation learning.

Submission Number: 5

Loading