A Three-Branch Cross-Modal Interactive Network for RGB-D Salient Defect Detection

Published: 2025, Last Modified: 06 Nov 2025IEEE Trans. Instrum. Meas. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: RGB defective images are abundant in color and texture, whereas depth images exhibit prominent defect shapes and boundaries. Given this fact, we propose a three-branch cross-modal interactive network for RGB-D salient defect detection (TCI-Net). Specifically, we first perform a real three-stream encoder–decoder network at both the image level and feature level, with each branch utilizing RGB, RGB-D, and depth images as input to fully extract the underlying complementary information from each modality. In the encoder stage, a differential guidance module (DGM) is proposed to guide the RGB branch in learning the boundary shape features of defects, while a fusion perception module (FPM) is devised to facilitate the depth branch in encoding more textual knowledge. In addition, we propose a cross-modal feature refinement module (CFRM) to bridge the feature gap between modalities and enhance information interaction. Finally, in the decoder stage, we incorporate boundary map supervision and a semantic guidance module (SGM) to enhance the details and contextual semantics of the defects, while gradually reconstructing the spatial scale. Extensive experiments and analyses on the RGB-D defect dataset NEU RSDDS-AUG demonstrate that the proposed TCI-Net significantly improves the segmentation accuracy compared with state-of-the-art algorithms.
Loading