Multi-scale pyramidal hash learning for traditional building facade image retrieval

Chongyan Wang, Yupeng Wang, Daojie Deng, Jiahe Cao, Wanqing Zhao

Published: 01 Jan 2024, Last Modified: 04 Nov 2024Int. J. Mach. Learn. Cybern. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Retrieving images captured of buildings is a critical need for intelligent urban management and tourism services. The different shooting angles, locations and distances lead to images under the same building object with cross-scale attributes, resulting in the existing retrieval methods being susceptible to interference from inter-class attributes, leading to low retrieval accuracy. In this work, we propose a deep multi-scale spatial pyramidal hash-learning framework (DMPH). The framework integrates the multi-scale representation and local perception capabilities of CNNs with the global perception capabilities of transformers. This framework will provide a multi-scale spatially preserved hash code for images. Further, we explore pyramid space matching to achieve cross-scale building retrieval without object annotation. Our framework is evaluated on two publicly available architectural retrieval datasets and achieves better performance than the comparison methods.