RGB-D Visual Perception for Occluded Scenes via Event Camera

Published: 01 Jan 2025, Last Modified: 04 Nov 2025Int. J. Comput. Vis. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: This paper presents the first RGB-D visual perception method and dataset for densely occluded scenes. Under such dense occlusion scenarios, existing synthetic aperture imaging methods could only recover the 2D appearance of the target scene. In contrast, our proposed method could see through dense foreground occlusions and recover both the 2D appearance and 3D structure of the target scene, which is more beneficial for downstream applications. To achieve this, our proposed method takes the occluded frames and event stream captured by a moving event camera as inputs, which could provide sufficient visual information about the densely occluded scene due to the high temporal resolution of the event camera. To tackle the noise interference caused by dense foreground occlusions, an occlusion segmentation module with the guidance of event epipolar-plane images is proposed to predict occlusion masks of input occluded frames and event stream. Then, invalid occlusions are excluded according to the predicted masks, and valid visual features are extracted to simultaneously predict the appearance and structure of the target scene. A lightweight high-order conditional random fields module is proposed to model multi-pixel higher-order correlations, making pixels with similar color and structure have smoother features. A cross-modal edge consistency mechanism is proposed to achieve consistent RGB-D visual perception. In addition, we construct a hybrid vision acquisition system and collect the first Event-enhanced Occluded scene RGB-D Visual Perception dataset, named \(\hbox {THU}^\text {E-OccVP}\), which will be released as the first RGB-D visual perception benchmark for densely occluded scenes. Experimental results show that our proposed framework achieves significantly superior results over other baseline solutions, and the ablation experiments further demonstrate the effectiveness of each proposed module.
Loading