Abstract: Instance retrieval is concerned with obtaining representations of instances (objects) in images and using them for similarity comparisons between instances. However, most methods require instance-level categories to train the model, which increases the burden of annotation. Along with the advancement of convolutional neural networks and transformers in computer vision, in this work, we propose a hierarchical with a spatial pyramidal structure for weakly supervised multi-instance hash learning. It merges the advantages of local and multi-scale perception on CNN with the global field of view on Transformer. Further, it leverages the principle of multi-instance learning, allowing the proposed model to implement an instance-level hash mapping capability in a weakly supervised learning manner. The experimental results on three public datasets achieved more improved results compared to the typical methods, validating the effectiveness of the proposed method.
Loading