Consistent Representation Learning Across Modalities for Zero-Shot Image Recognition

Yu Wang, Shengjie Zhao

Published: 09 Jun 2024, Last Modified: 01 Oct 2024ICMEEveryoneRevisionsCC BY 4.0

Abstract: Zero-shot learning (ZSL) recently has drawn widespread attention due to the demand for scalability of object recognition in real scenes. Existing approaches typically focus on directly learning various mapping functions from the visual space to the semantic space or vice versa. However, these methods fail to explicitly capture consistent representations reflecting the nature of different modalities of the same object, which results in the deterioration of the domain shift problem in the ZSL context during the testing stage. To this end, a consistent representation learning mechanism across visual modalities and the associated semantic modalities based on common subspace learning is proposed in this paper. We further impose an orthogonal constraint on the subspace for informative representations, as well as ℓ 21 -norm regularization terms on projection matrices for automatic feature selection. Finally, an iterative process based on the ALM algorithm with an alternating direction strategy is displayed to resolve the proposed formulation. Extensive experimental results on four popular datasets demonstrate that our algorithm is promising.