Whole Semantic Sparse Coding Network for Remote Sensing Image–Text Retrieval

Chengyu Zheng, Qi Wen, Xiu Li, Chenxue Yang, Jie Nie, Yiyun Guo, Yuntao Qian, Zhiqiang Wei

Published: 01 Jan 2025, Last Modified: 05 Nov 2025IEEE Transactions on Geoscience and Remote SensingEveryoneRevisionsCC BY-SA 4.0

Abstract: In recent years, cross-modal text-image retrieval in remote sensing (RS) has gained prominence as a research focus due to its potential to provide abundant, inclusive, and multiperspective information. However, existing methods usually focus on salient features, but these salient features cannot describe the image or text completely, resulting in the loss of some important details and discriminable information. In this article, a whole semantic sparse coding network (WSSCN) is proposed for RS image–text retrieval to build a complete and reliable features description for further improving the performance of the retrieval model. Specifically, the WSSCN first designs a whole semantic sparse representation coding (WSSRC) module by constructing a robust semantic library to transform the dense features matrix of the image and text into a whole semantic sparse matrix that enables multiple semantic decoupling and leads to a more precise and detailed features expression. Afterward, the intramodal and intermodal consistency (IIMC) module is devised to improve the intramodal and intermodal consistency of the whole semantic sparse representation from different models. Finally, the salient and whole semantic adaptive learning (SWSAL) loss is proposed to focus on salient information or whole semantic information by calculating an adjusted parameter. Quantitative and qualitative experiments are performed on four extensive datasets for cross-modal retrieval in RS to showcase the notable effectiveness achieved by implementing a WSSCN.

External IDs:doi:10.1109/tgrs.2025.3604386