Rotated Multi-Scale Interaction Network for Referring Remote Sensing Image Segmentation

Sihan Liu; Yiwei Ma; Xiaoqing Zhang; Haowei Wang; Jiayi Ji; Xiaoshuai Sun; Rongrong Ji

Rotated Multi-Scale Interaction Network for Referring Remote Sensing Image Segmentation

Sihan Liu, Yiwei Ma, Xiaoqing Zhang, Haowei Wang, Jiayi Ji, Xiaoshuai Sun, Rongrong Ji

Published: 01 Jan 2024, Last Modified: 22 May 2025CVPR 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Referring Remote Sensing Image Segmentation (RRSIS) is a new challenge that combines computer vision and natu-ral language processing. Traditional Referring Image Seg-mentation (RIS) approaches have been impeded by the com-plex spatial scales and orientations found in aerial imagery, leading to suboptimal segmentation results. To address these challenges, we introduce the Rotated Multi-Scale In-teraction Network (RMSIN), an innovative approach de-signed for the unique demands of RRSIS. RMSIN incorpo-rates an Intra-scale Interaction Module (IIM) to effectively address the fine-grained detail required at multiple scales and a Cross-scale Interaction Module (CIM) for integrating these details coherently across the network. Furthermore, RMSIN employs an Adaptive Rotated Convolution (ARC) to account for the diverse orientations of objects, a novel contribution that significantly enhances segmentation accu-racy. To assess the efficacy of RMSIN, we have curated an expansive dataset comprising 17,402 image-caption-mask triplets, which is unparalleled in terms of scale and vari-ety. This dataset not only presents the model with a wide range of spatial and rotational scenarios but also estab-lishes a stringent benchmark for the RRSIS task, ensuring a rigorous evaluation of performance. Experimental eval-uations demonstrate the exceptional performance of RM-SIN, surpassing existing state-of-the-art models by a signif-icant margin. Datasets and code are available at https://github.com/Lsan2401/RMSIN.

Loading