A Transformer-Based Cross-Modal Image-Text Retrieval Method using Feature Decoupling and Reconstruction

Published: 2022, Last Modified: 13 Nov 2024IGARSS 2022EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: With the increasing application of remote sensing technology, the task of cross-modal retrieval of remote sensing images (CMRRS) has gradually attracted widespread attention. Ex-isting methods often completely map the features of different modalities to a shared space and do not decouple between the modal-invariant information and modal-heterogeneous in-formation, which leads to redundant information in feature mapping and usually gets sub-optimal retrieval performance. This paper proposes a Transformer-based CMRRS method using feature decoupling and reconstruction (TBFDR) to solve this problem. TBFDR achieves state-of-the-art performance in remote sensing image-text retrieval task on Sydney-Captions dataset.
Loading