Label correlation preserving visual-semantic joint embedding for multi-label zero-shot learning

Published: 2025, Last Modified: 15 Jan 2026Multim. Tools Appl. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Multi-label zero-shot learning is a branch of the classification problem more in line with practical applications, because a real image does not usually contain only one category, and the labels to be classified are likely to never appear in the training set. The key to solving this problem is to effectively transfer the knowledge learned in seen classes to unseen classes. To this end, the commonly used approach is to embed visual features and semantic attributes into a joint visual-semantic space and align them by a transfer matrix \(\textbf{W}\). Previous works tacitly assuming that the relation between the visual embeddings and the semantic embeddings of the seen classes can be generalized to the unseen classes. However, in this paper, we argue that this default is only an idealized hypothesis. In practice, the knowledge generalization of seen classes to unseen classes may be weakened by the disruption of semantic label correlation. This study delves into the impact of semantic correlation on the generalization capabilities of models in multi-label zero-shot learning and verifys that through preserving the semantic correlation of seen classes in the visual-semantic joint space, the model can generalize better to the unseen classes. Subsequently, we prove that by keeping the transfer matrix \(\textbf{W}\) orthogonal, we can preserve the semantic correlation in the joint space. Then, we explore a method to use self-orthogonality regularization to keep the \(\textbf{W}\) orthogonal. Extensive experiments show that our proposed semantic correlation preserving method improves the mAP of the state-of-the-art model BiAM by 3.4% on NUS-WIDE and 1.6% on the large-scaled Open Images (V4) datasets, respectively.
Loading