Triplet Bridge for Zero-Shot Sketch-Based Image Retrieval

Published: 01 Jan 2025, Last Modified: 22 Jul 2025IEEE Trans. Emerg. Top. Comput. Intell. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Zero-Shot Sketch-Based Image Retrieval (ZS-SBIR) has always been a hard nut to crack due to the scarcity of sketch data and the abstract visual information contained in sketches. Previous works focus on designing various network architectures and using the gold standard triplet loss to solve ZS-SBIR, but they have always encountered obstacles in enhancing model generalization and extracting abstract visual information. In contrast, this work proposes a concise and effective Triplet Bridge (TriBri) framework to clear these obstacles fundamentally. Specifically, we use InfoNCE as the core to construct cross-modal representations between images and sketches, which can increase the margin between feature clusters with different categories in the representation space and improve the generalization of the model. Furthermore, we introduce text with abstract properties into the framework to construct a ternary relationship, and the three heterogeneous gaps between text, image, and sketch modalities are connected by InfoNCE. In this process, the common abstract visual cues in both images and sketches can be captured by the feature extractor with the guiding of text abstract information. Ultimately, comprehensive experiments on three commonly used datasets (i.e., TU-Berlin, Sketchy, and QuickDraw) validate that our framework can effectively solve these obstacles in a simple yet powerful manner. Furthermore, compared to state-of-the-art methods, the proposed TriBri exhibits comprehensive performance superiority.
Loading