Exploiting Category Names for Few-Shot Classification with Vision-Language Models

Taihong Xiao; Zirui Wang; Liangliang Cao; Jiahui Yu; Shengyang Dai; Ming-Hsuan Yang

Exploiting Category Names for Few-Shot Classification with Vision-Language Models

Taihong Xiao, Zirui Wang, Liangliang Cao, Jiahui Yu, Shengyang Dai, Ming-Hsuan Yang

Published: 06 Mar 2023, Last Modified: 06 Jul 2025MRL 2023Readers: Everyone

Keywords: Few-shot learning, vision-language model, category names

TL;DR: Category names could significantly help few-shot learning of vision-language models.

Abstract: Vision-language foundation models pretrained on large-scale data provide a powerful tool for many visual understanding tasks. Notably, many vision-language models build two encoders (visual and textual) that can map two modalities into the same embedding space. As a result, the learned representations achieve good zero-shot performance on tasks like image classification. However, when there are only a few examples per category, the potential of large vision-language models is often underperformed, mainly due to the gap between a large number of parameters and a relatively small amount of training data. This paper shows that we can significantly improve the performance of few-shot classification by using the category names to initialize the classification head. With the proposed category name initialization method, our model obtains the state-of-the-art performance on a number of few-shot image classification benchmarks (e.g., 87.37\% on ImageNet and 96.08\% on Stanford Cars, both using five-shot learning).

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/exploiting-category-names-for-few-shot/code)

0 Replies

Loading