LGPro: Language-Guided Prototype Discovery for Compositional Zero-Shot Learning

Published: 27 May 2026, Last Modified: 02 Jun 2026CompLearn 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Compositional Zero-Shot Learning, Prototype Learning, Vision-Language Models, Large Language Models, CLIP
TL;DR: We leverage an LLM to generate diverse visual descriptions per primitive, clustering their CLIP embeddings into semantic prototypes that initialize, anchor, and adaptively size the visual prototype space of a CZSL model
Abstract: Compositional Zero-Shot Learning (CZSL) requires recognizing unseen attribute-object compositions by combining knowledge from seen ones, demanding primitive representations that faithfully capture the full visual diversity of each attribute and object concept. The current state of the art, ClusPro, addresses this by learning $K$ visual prototypes per primitive via online within-primitive clustering, but initializes prototypes randomly and applies a uniform budget $K$ across all primitives, ignoring the rich semantic structure already encoded in language. We present LGPro: Language-Guided Prototype Discovery, a framework that seeds and anchors visual prototype learning with language knowledge. In our approach, an LLM generates $N$ visually grounded descriptions per primitive, CLIP encodes them into a joint embedding space, and K-means clustering on those embeddings yields semantic prototypes that initialize visual prototype buffers with linguistically meaningful starting points, and provide soft targets via a novel Semantic Anchoring Loss that prevents visual prototypes from drifting into language-agnostic regions during training. The per-primitive prototype budget, determined by the gap statistic over text embeddings, allocates more prototypes to semantically rich primitives (eg, "old", "broken") and fewer to simpler ones (eg, "red", "blue"). All additions are performed offline or incur negligible overhead. Experiments on MIT-States, UT-Zappos, and C-GQA under both closed-world and open-world settings demonstrate consistent improvements over baseline model ClusPro.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 18
Loading