An AI-Enabled All-In-One Visual, Proximity, and Tactile Perception Multimodal Sensor

Menghao Pu; Tiyong Zhao; Lingxi Zhang; Chaoqun Han; Zhiping Chai; Yifan Zhou; Han ding; Zhigang Wu

An AI-Enabled All-In-One Visual, Proximity, and Tactile Perception Multimodal Sensor

Menghao Pu, Tiyong Zhao, Lingxi Zhang, Chaoqun Han, Zhiping Chai, Yifan Zhou, Han ding, Zhigang Wu

Published: 25 Sept 2025, Last Modified: 14 Oct 2025IROS 2025 Workshop Tactile Sensing PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: all-in-one multimodal sensor, deep learning, force sensing, proximity estimation, visual perception

TL;DR: We present an AI-enabled all-in-one sensor that unifies vision, proximity, and tactile perception for long-sequence robotic interactions.

Abstract: Visual, proximity, and tactile perception are essential sensing modalities for providing comprehensive information in interactive robotic tasks. However, integrating multiple sensors poses several challenges, including increased volume and cost, difficulties with signal synchronization and multi-sensor cross-interference or signal disruption. To tackle these challenges, we propose the vision-proximity-tactility sensor (VPTS), an AI-enabled, all-in-one multimodal sensor designed for holistic perception through efficient collaboration and information transfer between modalities, enabling complex, long-sequence robotic interactions. Facilitated by a transparent membrane patterned with ultraviolet (UV)-excited fluorescent markers, VPTS utilizes a focus-adjustable monocular camera to switch between visual, proximity, and tactile perception modalities in a time-division mode. It switches modalities by toggling UV light, camera focus, and three corresponding dedicated deep learning models. VPTS achieves an F1 score of 0.9733 in visual perception, 5.098 mm mean absolute error in proximity estimation, and 0.653 mN root–mean square error in force sensing. Real-world experiments, such as a computer music game involving up to 28 consecutive subtasks, show a cohesive pipeline where different sensing modalities collaboratively support such long-sequence manipulations, verifying VPTS’s effectiveness for intricate, multimodal interactive tasks.

Submission Number: 17

Loading