Learning to Segment with Uncertainty: Mask-Guided Feature Refinement and Multi-Path Fusion for High-Resolution Remote Sensing Images
Abstract: Semantic segmentation of high resolution remote sensing imagery plays a crucial role in land use monitoring and urban planning. Although deep learning-based methods have achieved significant progress and can obtain satisfactory segmentation results, they suffer from limitations in robustness and reliability, and tend to produce overconfident predictions that often overlook the inherent uncertainty in the segmentation process. This poses challenges for their deployment in safety-critical applications such as land detection and disaster assessment. Moreover, despite the widespread adoption of encoder-decoder architectures, existing methods still struggle to fully exploit the high-dimensional features extracted by encoders. To address these issues, this study proposes an Uncertainty Mask Feature Refinement Module and a Multi-Path Feature Interaction structure. Specifically, the proposed method computes total uncertainty through Monte Carlo sampling and learns a binary mask by applying learnable perturbations to the original data, thereby identifying regions that contribute significantly to output uncertainty. This approach quantifies the inherent uncertainty in the data and refines feature representations in high-uncertainty regions. After the encoder encodes the refined features, each decoder layer not only aggregates features from the corresponding encoder layer but also incorporates features from deeper encoding levels, achieving fine-grained segmentation results through multi-path feature interaction. We conducted comprehensive experiments on two well-known high-resolution remote sensing benchmark datasets (ISPRS Vaihingen and ISPRS Potsdam). The results demonstrate that the proposed method outperforms state-of-the-art approaches in segmentation accuracy while explicitly constructing uncertainty estimates, providing new insights for developing more robust and reliable remote sensing image segmentation models.
Loading