LDNET: Semantic Segmentation Of High-Resolution Images Via Learnable Patch Proposal And Dynamic Refinement

Yuyang Ji, Lianlei Shan

Published: 01 Jan 2024, Last Modified: 05 Feb 2025ICME 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Due to the requirement of large GPU memory, semantic segmentation of ultra-high-resolution images faces many challenges. Previous methods are to downsample or crop into patches to fit the GPU’s memory limitations, which results in the loss of detailed information or context. To solve this problem, we propose a multi-branch network called Learnable Patch Proposal and Dynamic Refinement Network (LD-Net) specifically for semantic segmentation of high-resolution aerial images. Instead of simply putting all patches into the network for fusion, we calculate the uncertainties of each patch to select the most valuable ones to avoid invalid operations, thus improving efficiency. As for accuracy, we propose a module called Dynamic Refinement. This module uses global features (from downsampling images) as queries and local features (from patches) as keys and values when fusing features. Vice versa can also be established. In this way, different types of features can be effectively integrated. Experimental results show that our method achieves remarkable improvements over the previous state-of-the-art methods on two benchmark datasets DeepGlobe and ISIC.