ChatGPT-Powered Hierarchical Comparisons for Image Classification

Published: 21 Sept 2023, Last Modified: 02 Nov 2023NeurIPS 2023 posterEveryoneRevisionsBibTeX
Keywords: ChatGPT, Hierarchical Comparisons, Image Classification, Zero shot
TL;DR: A training-free, explainable, and effective zero-shot image classification method with enriched hierarchical descriptions powered by LLMs.
Abstract: The zero-shot open-vocabulary setting poses challenges for image classification. Fortunately, utilizing a vision-language model like CLIP, pre-trained on image-text pairs, allows for classifying images by comparing embeddings. Leveraging large language models (LLMs) such as ChatGPT can further enhance CLIP’s accuracy by incorporating class-specific knowledge in descriptions. However, CLIP still exhibits a bias towards certain classes and generates similar descriptions for similar classes, disregarding their differences. To address this problem, we present a novel image classification framework via hierarchical comparisons. By recursively comparing and grouping classes with LLMs, we construct a class hierarchy. With such a hierarchy, we can classify an image by descending from the top to the bottom of the hierarchy, comparing image and text embeddings at each level. Through extensive experiments and analyses, we demonstrate that our proposed approach is intuitive, effective, and explainable. Code will be released upon publication.
Supplementary Material: zip
Submission Number: 10004