Bayesian Primitive Distributing for Compositional Zero-shot Learning

16 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Compositional Zero-shot Learning, Probability Distribution, Bayesian Inference
TL;DR: We propose a novel Bayesian-induced framework that learns a probability distribution over each primitive to model the intra-primitive variance.
Abstract: Compositional zero-shot learning (CZSL) aims to recognize unseen attribute-object combinations by learning primitive concepts (i.e., attribute and object) from seen compositions. Existing CZSL solutions typically harness the power of vision-language models like CLIP via textual prompt tuning and visual adapters. However, they independently learn one deterministic textual prompt for each primitive or compositional labels, ignoring both the inherent semantic diversity within each primitive and the semantic relationships between primitive concepts and their compositions. In this paper, we propose BAYECZSL, a novel Bayesian-induced framework that learns probability distributions over each primitive textual prompt from a Bayesian perspective. Specifically, BAYECZSL models image-specific primitive textual prompts as learnable probability distributions to capture intra-primitive diversity. Building on these primitive distributions, we aggregate learned probability distributions from attribute and object branches to form compositional prompt space via Compositional Distribution Synthesis strategy, thus capturing the semantic relationships between primitive concepts and their compositions. Moreover, Three-path Distribution Enhancement module is introduced to transform initial distributions into expressive ones via invertible mappings. Finally, these enhanced distributions are sampled to generate diverse textual prompts, achieving more comprehensive coverage of the prompt space and generalizing to unseen compositions. Extensive experiments on multiple CZSL benchmarks demonstrate the superiority of our BAYECZSL. Code will be released.
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 7241
Loading