Learning Dual-Stream Conditional Concepts in Compositional Zero-Shot Learning

Published: 2025, Last Modified: 08 Jan 2026IEEE Trans. Pattern Anal. Mach. Intell. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Compositional Zero-Shot Learning (CZSL) aims to recognize unseen compositional concepts composed of seen single concepts. One of the problems of CZSL is to model attributes interacting with objects and objects interacting with attributes. In this work, we focus on this problem and propose Dual-Stream Conditional Network (DSCNet) that learns dual-stream conditional concepts as a solution, where the conditional visual and semantic embeddings of attributes and objects are learned. First, we argue that the condition of the attribute or object is supposed to contain the recognized object and input image, or the recognized attribute and input image. Next, for each concept which can either be an attribute or object, in the semantic stream, we propose to encode the recognized object or attribute semantic features and the input image visual features as the encoded condition, which is then injected into all concept semantic embeddings by a semantic cross encoder to acquire conditional semantic embeddings. In the visual stream, the conditional attribute or object visual embeddings are acquired by injecting the semantic features of the recognized object or attribute into the mapped attribute or object visual features. Experimental results on CZSL benchmarks demonstrate the superiority of our proposed method.
Loading