PCNEXT: Convolution is All You Need for Semantic Segmentation for Remote Sensing Images

Jinghan Yang, Yueyao Su, Hanlu Zhen, Junli Yang, Haopeng Zhang

Published: 01 Jan 2024, Last Modified: 20 May 2025IGARSS 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Semantic segmentation of remote sensing images is crucial for various applications, including land use mapping and environmental monitoring. However, most CNNs lack the ability to capture long-range context due to their limited receptive fields. While Transformers adopt multi-head self-attention mechanism to capture long-range context for better accuracy, it often leads to high parameter volume and computational complexity. In this paper, we propose PCNeXt, a lightweight pure-convolutional neural network for semantic segmentation of remote sensing images. For the encoder part, we design a pure-convolutional lightweight module named MSWCA based on MSCA, which is the key component of the encoder of SegNeXt. For the decoder part, we design a purely Convolutional Global-Local Block (CGLB) to replace the GLTB module of UNetFormer, which utilizes the MSWCA module instead of self-attention to capture global context and a simple convolutional operation to capture local context. The experiments prove that instead of using self-attention in transformer, the proposed PCNeXt is capable of achieving competitive accuracy by using only convolutional structure. Notably, our PCNeXt achieves 91.53% aAcc and 83.9 % mIoU on the Potsdam dataset, with only 4.42MB parameters and 6.87G FLOPs calculations.