Abstract: Retrieving images captured of buildings is a critical need for intelligent urban management and tourism services. The different shooting angles, locations and distances lead to images under the same building object with cross-scale attributes, resulting in the existing retrieval methods being susceptible to interference from inter-class attributes, leading to low retrieval accuracy. In this work, we propose a deep multi-scale spatial pyramidal hash-learning framework (DMPH). The framework integrates the multi-scale representation and local perception capabilities of CNNs with the global perception capabilities of transformers. This framework will provide a multi-scale spatially preserved hash code for images. Further, we explore pyramid space matching to achieve cross-scale building retrieval without object annotation. Our framework is evaluated on two publicly available architectural retrieval datasets and achieves better performance than the comparison methods.
Loading