Joint Semantic Segmentation of Optical and SAR Image in Hazy Environments via Cross-modal Information Rectification and Cross-attention Fusion

Xinyue Fan, Libao Zhang

Published: 01 Jan 2025, Last Modified: 20 Jul 2025ICASSP 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Semantic segmentation is crucial in remote sensing image processing. In recent years, semantic segmentation using optical and SAR images for multi-modal fusion is gaining attention for its good results. The current research primarily encompasses two problems: 1) Existing fusion methods are designed for clear images and struggle in harsh weather. 2) Current fusion methods insufficiently capture multi-modal information correlation. This paper presents a joint semantic segmentation of optical and SAR in hazy environments network that incorporates channel fusion for feature enhancement and cross-attention for feature fusion, enabling efficient segmentation of hazy optical images. First, we design a channel-guided crossmodal information correction module. This module regards fog as noise and uses high-confidence features of one modality to calibrate the other modality, thereby reducing the impact of fog. Secondly, to address the issue of insufficient fusion of multimodal information, we introduce a cross-attention fusion module to effectively combine the complementary information from two streams leveraging the large receptive field enabled by self-attention. The implementation results show that this method has better results for the semantic segmentation of blurry remote sensing images.