Learning a Perspective-Invariant Descriptor for Remote Sensing Image Matching

Jia Wang, Zhiguo Qu, Lingshuang Kong, Wentao Yuan, Encai Liu, Rui Zhang, Ruigang Fu

Published: 2025, Last Modified: 29 Jan 2026IEEE Trans. Circuits Syst. Video Technol. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Image descriptors are crucial in remote sensing image matching tasks. However, the presence of nonlinear transformation and dimensional collapse inherent in the perspective imaging process often poses challenges to achieving accurate matches. Existing descriptors lack a theoretical analysis of the perspective distortion process and fail to mine the patterns hidden in the perspective imaging process, consequently limiting their efficacy in remote sensing image matching. To uncover the underlying patterns in the image and devise a perspective-invariant descriptor, this paper proposes a perspective-invariant descriptor network (PIDNet). In our approach, we first analyze the remote sensing imaging process and demonstrate that it can be described in a new, conceptually simple linear space named the perspective distortion space. Second, we extract the bases from this space via the intersection-over-union (IoU) metric. As a result, each element in the space can be linearly expressed by the bases. Finally, we utilize these bases to design and learn a perspective-invariant descriptor. The core idea of our descriptor is based on the fact that each base corresponds to a unique imaging viewpoint. Therefore, any imaging viewpoint can be linearly represented as a combination of the bases. To implement our PIDNet, we propose a perspective sampling network module (PSNM) based on the spatial transform networks (STN) since no modules are available for our image sampling process. Furthermore, we introduce a perspective convolutional layer (PCLayer) to extract intermediate covariant features. Then, we concatenate the covariant features to learn a perspective-invariant descriptor. Experimental results on three datasets, including single-modal and multi-modal images, demonstrate the superior performance of PIDNet compared to state-of-the-art methods. Our source code will be publicly available at https://github.com/jaxwangkd04/

External IDs:dblp:journals/tcsv/WangQKYLZF25