Abstract: The Zero-Shot Sketch-Based Image Retrieval (ZS-SBIR) task seeks to match images with the same semantic essence as a hand-drawn sketch from a vast image repository. Given the stark contrast in information density between simple-line sketches and detailed images, this task encounters two formidable challenges: 1) Network layers focus differently on semantically relevant features across the two domains, and 2) The sparse information in sketches hampers the extraction of meaningful features. In response, we introduce the innovative Cross-Domain Feature Semantic Calibration (CD-FSC) model. This model begins by evaluating semantic correlations between domains and layers through attention map analyses in vision transformers to ensure precise semantic alignment. Subsequently, it harnesses category associations learned from the image domain to bolster semantic learning in the sketch domain. Our extensive comparative experiments across three prevalent ZS-SBIR datasets affirm that our model sets a new benchmark, outperforming current leading methods.
Loading