Abstract: Recently, remote sensing (RS) text-image retrieval has gained increasing attention among researchers due to its capability to provide abundant, inclusive, and multiperspective information. To extract the salient representation of the two modalities and realize information alignment, existing methods usually apply convolution calculation, matrix multiplication, and attention mechanisms to model features in the spatial domain. However, unlike natural images, RS data contain a considerable amount of noise, and the spatial-domain calculation methods will lead to information smoothing and noise exchange, ultimately reducing the robustness of feature representation. In this article, a frequency- and spatial-domain saliency network (FSSN) is proposed by extracting saliency features in both frequency and spatial domains to further enhance the efficiency of retrieval networks. Specifically, the FSSN first designs a frequency-domain intramodal low-pass filter (FILF) by using the Fourier transform (FT) to convert the spatial-domain representation into the frequency-domain representation, and leverage low-pass filtering to filter out the noise information of the image. Afterward, the space-domain intermodal boundary-based saliency (SIBS) module is devised, which fully utilizes the positive and negative boundaries and draws an innovative assessment mechanism to automatically recognize the effective regions and words without noise shifting in the spatial domain. Finally, the frequency- and spatial-domain saliency fusion (FSSF) is designed to realize the effective integration of features obtained from the frequency and spatial domain according to the consistency of the identity masks. Quantitative and qualitative experiments are implemented on four RS benchmarks to showcase the notable effectiveness achieved by the joint modeling network of multiple domains. Specifically, the proposed FSSN outperforms the best model RemoteClip by 7.58% of the mR evaluation matrix on the UCM dataset.
External IDs:doi:10.1109/tgrs.2025.3561626
Loading