An AI-Enabled All-In-One Visual, Proximity, and Tactile Perception Multimodal Sensor

30 Aug 2025 (modified: 01 Sept 2025)IEEE IROS 2025 Workshop Tactile Sensing SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: all-in-one multimodal sensor, deep learning, force sensing, proximity estimation, visual perception
TL;DR: We present an AI-enabled all-in-one sensor that unifies vision, proximity, and tactile perception for long-sequence robotic interactions.
Abstract: Visual, proximity, and tactile perception are essential sensing modalities for providing comprehensive information in interactive robotic tasks. However, integrating multiple sensors poses several challenges, including increased volume and cost, difficulties with signal synchronization and multi-sensor cross-interference or signal disruption. To tackle these challenges, we propose the vision-proximity-tactility sensor (VPTS), an AI-enabled, all-in-one multimodal sensor designed for holistic perception through efficient collaboration and information transfer between modalities, enabling complex, long-sequence robotic interactions. Facilitated by a transparent membrane patterned with ultraviolet (UV)-excited fluorescent markers, VPTS utilizes a focus-adjustable monocular camera to switch between visual, proximity, and tactile perception modalities in a time-division mode. It switches modalities by toggling UV light, camera focus, and three corresponding dedicated deep learning models. VPTS achieves an F1 score of 0.9733 in visual perception, 5.098 mm mean absolute error in proximity estimation, and 0.653 mN root–mean square error in force sensing. Real-world experiments, such as a computer music game involving up to 28 consecutive subtasks, show a cohesive pipeline where different sensing modalities collaboratively support such long-sequence manipulations, verifying VPTS’s effectiveness for intricate, multimodal interactive tasks.
Submission Number: 17
Loading