Visual-Semantic Graph Matching Net for Zero-Shot Learning

Bowen Duan, Shiming Chen, Yufei Guo, Guo-Sen Xie, Weiping Ding, Yisong Wang

Published: 01 Jan 2025, Last Modified: 03 Aug 2025IEEE Trans. Neural Networks Learn. Syst. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Zero-shot learning (ZSL) aims to leverage additional semantic information to recognize unseen classes. To transfer knowledge from seen to unseen classes, most ZSL methods often learn a shared embedding space by simply aligning visual embeddings with semantic prototypes. However, methods trained under this paradigm often struggle to learn robust embedding space because they align the two modalities in an isolated manner among classes, which ignore the crucial class relationship during the alignment process. To address the aforementioned challenges, this article proposes a visual-semantic graph matching net (VSGMN), which leverages semantic relationships among classes to aid in visual-semantic embedding. VSGMN uses a graph build net (GBN) and a graph matching net (GMN) to achieve two-stage visual-semantic alignment. Specifically, GBN first uses an embedding-based approach to build visual and semantic graphs in the semantic space and align the embedding with its prototype for first-stage alignment. In addition, to supplement unseen class relationships in these graphs, GBN also builds the unseen class nodes based on semantic relationships. In the second stage, GMN continuously integrates neighbor and cross-graph information into the constructed graph nodes and aligns the node relationships between the two graphs under the class relationship constraint. Extensive experiments on three benchmark datasets demonstrate that VSGMN achieves superior performance in both conventional and generalized ZSL (GZSL) scenarios. The implementation of our VSGMN and experimental results are available at github: https://github.com/dbwfd/VSGMN.