Swap-Reconstruction Autoencoder for Compositional Zero-Shot Learning

Ting Guo, Jiye Liang, Guo-Sen Xie

Published: 01 Jan 2023, Last Modified: 15 May 2025ICME 2023EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Compositional zero-shot learning (CZSL) aims to distinguish images from unseen compositional classes, which consist of state and object concepts that individually appear in some seen compositional images. The key challenge of CZSL is how to effectively mitigate the contextuality issue for achieving a desirable compositional transfer from seen classes to unseen ones. In CZSL, the visual appearances of the same state are inconsistent when combined with different objects. To address the above dilemma, we propose a swap-reconstruction autoencoder (SRA) to capture the intrinsic context of the ambiguous states. Specifically, SRA learns a consistent embedding space for multi-modal data. A swap-reconstruction mechanism is designed to disentangle the visual embedding of states and objects. The loss including a superclass-oriented state swap-reconstruction loss and object swap-reconstruction loss model the contextual relationship between states and objects. Extensive experiments demonstrate that SRA outperforms current state-of-the-art methods on the three benchmark datasets.