Description-Only Supervision: Contrastive Label–Embedding Alignment for Zero-Shot Text Classification

19 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Zero-shot text classification, description-only supervision, contrastive learning, embedding models, label–description alignment, multi-positive InfoNCE, text embeddings, annotation efficiency
TL;DR: We show that lightly finetuning embedding models on a handful of natural-language label descriptions via contrastive alignment greatly improves zero-shot text classification.
Abstract: Zero-shot text classification (ZSC) seeks to assign texts to label spaces without relying on task-specific labeled documents. Yet, practical deployments of embedding models for classification often fall back on training task-specific classifiers (e.g., linear probes on frozen embeddings) to recover task-specific performance, reintroducing annotation costs and undermining the zero-shot setting. We introduce \emph{contrastive label-embedding alignment}, a simple, compute-efficient alternative that uses only a handful of natural-language descriptions per label and no labeled documents. We lightly fine-tune a base embedding model so that label verbalizers and their descriptions are aligned in a shared space: a symmetric multi-positive contrastive objective pulls each verbalizer toward its associated descriptions while pushing it away from others, capturing the many-to-one label-description relation. Across four benchmarks (topic, sentiment, intent, emotion) and ten encoders (22M-600M parameters), as few as five descriptions per label yield consistent gains, improving macro-F1 by $+0.09$ on average over zero-shot baselines, corresponding to relative improvements of roughly $5–13$% across models. Compared to a few-shot SetFit baseline with 8 labeled examples per class, our method attains higher mean performance with substantially lower variance across repeated runs, indicating improved stability in low-data regimes. The method uses label descriptions as the sole supervision signal to learn a label-specific embedding geometry for an off-the-shelf dual encoder via a symmetric multi-positive contrastive objective, while retaining efficient pre-encodable dual-encoder inference at test time.
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Submission Number: 19877
Loading