Visual Feature Disentanglement for Zero-Shot Learning

Qingzhi He, Rong Quan, Weifeng Yang, Jie Qin

Published: 2024, Last Modified: 04 Feb 2026ICME 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Generative model-based zero-shot learning (ZSL) approaches usually transform ZSL to supervised learning by generating full visual features for unseen classes, which may contain redundant or noisy information harmful for ZSL classification. In this work, we propose a novel generative framework that fully exploits the visual features by disentangling them into three distinct components, i.e., semantically correlative (SC), visually discriminative (VD), and residual features, respectively. In particular, we propose an encoder-decoder disentanglement framework to learn SC features that align closely with their corresponding semantic label embeddings, as well as VD features that contribute significantly to visual classification accuracy. Additionally, we introduce a mutual information-based loss to disperse the distributions between the discriminative (SC and VD) features and the residual ones, thereby eliminating the redundant information that may hinder the final ZSL classification. Our proposed method is extensively evaluated on four popular ZSL benchmarks, where the experimental results demonstrate its superiority over existing counterparts. Code is available at https://github.com/RowenaHe/VFD-ZSL.