Abstract: Pixel-level annotation tasks are important in the intelligent processing of remote sensing images. For these tasks, interactive image segmentation (IIS) models using click prompts are developing fast in the field of natural images. However, most interactive segmentation models using click prompts are unsuitable for remote sensing images with their current design of click prompts and their interaction schemes with image information. Based on the situation, we used a DETR-like model as the basic framework and redesigned the pixel decoder and the transformer decoder to better suit the task of IIS for remote sensing images. In the pixel decoder, we designed a click prompt with feature encoding to learn click information and a composite attention structure to facilitate interaction between click and image information, allowing the image feature at the click locations to more easily dominate annotation masks. In the transformer decoder, we utilized deformable attention, using only a single initialized query to obtain annotation masks and IoU prediction. In this article, we trained our model on a composite remote sensing dataset and evaluated its performance on external datasets. The results showcased the model’s adaptability, achieving superior performance compared with existing methods. The code will be available at: https://github.com/songbingze/ClickPromptRSIIS
External IDs:dblp:journals/tgrs/SongLLZCXZL25
Loading