Detail Enhancement and Transfer Balance for Open-Vocabulary Compositional Zero-Shot Learning

17 Sept 2025 (modified: 23 Dec 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: open-vocabulary compositional zero-shot learning, attribute enhancement, transfer balance
Abstract: Compositional Zero-Shot Learning (CZSL) aims to recognize unseen attribute-object compositions by learning from seen combinations of visual primitives. Recent advances extend this task to the Open-Vocabulary setting (OV-CZSL), where novel attributes or objects may appear at test time. This setting presents two major challenges: (1) global visual features often lack the granularity required to distinguish fine-grained attribute information, particularly in unseen compositions; and (2) indiscriminate knowledge transfer from seen to unseen compositions can compromise class boundaries, leading to overfitting on seen compositions. To address these issues, we propose a novel OV-CZSL framework that integrates Detail Enhancement and Transfer Balance (DETB). Specifically, we propose a Multi-scale Condition-guided Diffusion (MCD) module that selectively refines challenging samples by integrating global semantic priors with localized visual disentangled representations, enabling the recovery of fine-grained attribute information essential for compositional recognition. Furthermore, we introduce a Transfer Balance Loss (TBL) that adaptively adjusts the semantic margins between seen and unseen compositions according to their inter-class similarity. This encourages effective knowledge transfer while maintaining clear class separation. Extensive experiments on three OV-CZSL benchmark datasets show that DETB consistently outperforms existing approaches, setting a new state-of-the-art.
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 8332
Loading