Abstract: The semantic segmentation of remote sensing images is extensively used in crop cover and type analysis and environmental monitoring. In the semantic segmentation of remote sensing images, owning to the specificity of remote sensing images, not only the local context is required, but also the global context information makes an important role in it. Inspired by the powerful global modeling capability of Swin Transformer, we propose the Local and Spatial Enhancement Net (LSENet) network, which follows the encoder–decoder architecture of the UNet network. In the encoding phase, we propose spatial enhancement module (SEM), which helps Swin Transformer further enhance feature extraction by encoding spatial information. In the decoding stage, we propose local enhancement module (LEM), which is embedded in the Swin Transformer to improve the Swin Transformer to assist the network to obtain more local semantic information so as to classify pixels more accurately, especially in the edge region, the adding of LEM enables to obtain smoother edges. The experimental results on the Vaihingen and Potsdam datasets demonstrate the effectiveness of our proposed method. Specifically, the mIoU metric is 78.58% on the Potsdam dataset and 72.59% on the Vaihingen dataset.
External IDs:dblp:journals/lgrs/DingXLZC24
Loading