SwinTFNet: Dual-Stream Transformer With Cross Attention Fusion for Land Cover Classification

Bo Ren, Bo Liu, Biao Hou, Zhao Wang, Chen Yang, Licheng Jiao

Published: 01 Jan 2024, Last Modified: 13 Nov 2024IEEE Geosci. Remote. Sens. Lett. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Land cover classification (LCC) is an important application in remote sensing data interpretation. As two common data sources, synthetic aperture radar (SAR) images can be regarded as an effective complement to optical images, which will reduce the influence caused by single-modal data. However, common LCC methods focus on designing advanced network architectures to process single-modal remote sensing data. Few works have been oriented toward improving segmentation performance through fusing multimodal data. In order to deeply integrate SAR and optical features, we propose SwinTFNet, a dual-stream deep fusion network. Through the global context modeling capability of Transformer structure, SwinTFNet models teleconnections between pixels in other regions and pixels in cloud regions for better prediction in cloud regions. In addition, a cross-attention fusion module (CAFM) is proposed to fuse features from optical and SAR data. Experimental results show that our method improves greatly in the classification of clouded images compared with other excellent segmentation methods and achieves the best performance on multimodal data. The source code of SwinTFNet is publicly available at https://github.com/XD-MG/SwinTFNet .