Description-Only Supervision: Contrastive Label–Embedding Alignment for Zero-Shot Text Classification
Keywords: Zero-shot text classification, description-only supervision, contrastive learning, embedding models, label–description alignment, multi-positive InfoNCE, text embeddings, annotation efficiency
TL;DR: We show that lightly finetuning embedding models on a handful of natural-language label descriptions via contrastive alignment greatly improves zero-shot text classification.
Abstract: Zero-shot text classification (ZSC) aims to assign labels without task-specific annotation by exploiting the semantics of human-readable labels. In practice, embedding-based ZSC often falls back on training a linear probe, reintroducing annotation costs. We propose \emph{description-only supervision}, a simple, compute-efficient alternative that requires only a handful of natural-language descriptions per label. We lightly finetune a base embedding model with a contrastive objective that pulls each label verbalizer toward its associated descriptions while pushing it away from others, using a multi-positive formulation to capture the many-to-one label–description relation. Across four benchmarks (topic, sentiment, intent, emotion) and ten encoders (22M–600M parameters), as few as five descriptions per label yield consistent gains, improving macro-F1 by +0.10 on average over zero-shot baselines. Compared to a few-shot SetFit baseline with 8 examples per class, our method attains higher mean performance with substantially lower variance across 20 runs, indicating improved stability in low-data regimes. The approach preserves the dual-encoder advantage (pre-encodable labels/documents), avoids labeled documents entirely, and adds minimal engineering overhead.
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Submission Number: 19877
Loading