CFEINet: Cross-fusion and feature enhancement interaction network for RGB-D semantic segmentation

Bin Ge, Yiming Lu, Chenxing Xia, Xu Zhu, Mengge Zhang, Mengya Gao, Ningjie Chen

Published: 01 Jan 2025, Last Modified: 10 Apr 2025Digit. Signal Process. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Currently, significant progress has been made in RGB-D semantic segmentation research. However, low-quality depth images and challenges in cross-modal feature interactions persist as two significant issues. Therefore, this paper proposes a CFEINet: cross-fusion and feature enhancement interaction network for RGB-D semantic segmentation. CFEINet comprises three main components: the Two-Branch Asymmetric Feature Enhancement Module (TAEM), the Cross-Modal Feature Interaction Refinement Module (CFIM), and the Information Interaction Fusion Extraction Module (IIFM). TAEM employs a two-branch asymmetric enhancement technique to mitigate the impact of low-quality depth images by enhancing both depth and RGB features through boundary adaptation and new channel focus, respectively. CFIM emphasizes the consistency of features across different modalities, facilitating interaction between RGB and depth features to improve their quality. IIFM takes advantage of synchronising different modalities and using global-local information to compensate for inter-modal differences, thus enhancing target feature capture and improving segmentation performance. Extensive experiments conducted on the NYU Depth V2 and SUN-RGBD datasets demonstrate the superior performance of the proposed model compared to state-of-the-art methods.