Abstract: Remote sensing text-image retrieval aims to retrieve valuable information from diverse and complex remote sensing data, attracting significant attention. However, the performance is limited due to the complexity of scenes and their substantial content differences from natural domain images. To address these issues, we propose a simple but effective text-guided knowledge transfer (TGKT) method for remote sensing image-text retrieval. TGKT utilizes CLIP to encode remote sensing data and transfer its rich semantic knowledge from natural to remote sensing domain. The textual information without significant domain differences is employed to bridge the semantic gap between these two domains, thereby enhancing image features. The extensive experimental results on both RSICD and RSITMD datasets demonstrate the effectiveness of our method.
Loading