From Coarse To Fine: An Offline-Online Approach for Remote Sensing Cross-Modal Retrieval

Wenqian Zhou, Hanlin Wu

Published: 2024, Last Modified: 04 Apr 2026IGARSS 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Remote sensing image-text retrieval (RSITR) has recently attracted unprecedented attention in the remote sensing (RS) community. Early dual-stream methods, such as hash retrieval, offered fast retrieval speeds but suffered from low accuracy. Recent research has markedly improved retrieval accuracy through single-stream architectures, yet has overlooked the retrieval efficiency in large-scale RS datasets. To achieve a balance between efficiency and accuracy, we propose a coarse-to-fine two-stage framework for image-text retrieval (C2F-ITR). In the coarse matching stage, we employ a dual-stream model to identify candidate results for an image or text query. In the fine reranking stage, we use a single-stream model to rerank these candidate results. To enhance retrieval efficiency, we compute all image and text embeddings independently offline and reuse them for both coarse matching and fine reranking online. Extensive experiments on public datasets demonstrate that our method significantly outperforms existing approaches.
Loading