ChatGPT-Powered Hierarchical Comparisons for Image Classification

Zhiyuan Ren; Yiyang Su; Xiaoming Liu

ChatGPT-Powered Hierarchical Comparisons for Image Classification

Zhiyuan Ren, Yiyang Su, Xiaoming Liu

Published: 21 Sept 2023, Last Modified: 02 Nov 2023NeurIPS 2023 posterEveryoneRevisionsBibTeX

Keywords: ChatGPT, Hierarchical Comparisons, Image Classification, Zero shot

TL;DR: A training-free, explainable, and effective zero-shot image classification method with enriched hierarchical descriptions powered by LLMs.

Abstract: The zero-shot open-vocabulary setting poses challenges for image classification. Fortunately, utilizing a vision-language model like CLIP, pre-trained on image-text pairs, allows for classifying images by comparing embeddings. Leveraging large language models (LLMs) such as ChatGPT can further enhance CLIP’s accuracy by incorporating class-specific knowledge in descriptions. However, CLIP still exhibits a bias towards certain classes and generates similar descriptions for similar classes, disregarding their differences. To address this problem, we present a novel image classification framework via hierarchical comparisons. By recursively comparing and grouping classes with LLMs, we construct a class hierarchy. With such a hierarchy, we can classify an image by descending from the top to the bottom of the hierarchy, comparing image and text embeddings at each level. Through extensive experiments and analyses, we demonstrate that our proposed approach is intuitive, effective, and explainable. Code will be released upon publication.

Supplementary Material: zip

Submission Number: 10004

Loading