Abstract: Hierarchical semantic structures are essential in human recognition and understanding of the world. We can recognize shared features between different entities across multiple semantic levels. However, existing deep metric learning methods primarily focus on finding discriminative features between classes on a single semantic level, neglecting some coarser features they share with other classes. This work proposes a method that arranges visual features from multiple semantic levels in a coarse-to-fine manner within a single feature space. Our approach allows us to create scalable embeddings from a single feature vector that can efficiently discriminate on multiple semantic levels and has a clear geometric interpretation. We evaluate our method on three hierarchical image retrieval datasets: DyML-Vehicle, DyML-Animal, and DyML-Product, achieving better or on-par results compared to state-of-the-art methods without compromising performance on the fine-grained level or being biased towards any other semantic level. Finally, we show that our method leads to a more intuitive and better-organized feature space.
External IDs:dblp:conf/icip/BudimirSKL24
Loading