Few-shot Fine-grained Image Classification with Interpretable Prompt Learning through Distribution Alignment

TMLR Paper4217 Authors

15 Feb 2025 (modified: 22 Mar 2025)Withdrawn by AuthorsEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Explainable few-shot fine-grained image classification is an essential task to align AI with human preferences by enabling precise recognition of subtle differences and providing explanations for decisions. Existing supervised models often struggle in few-shot scenarios due to their reliance on extensive labeled data, which is intractable to collect for customized human preferences. Meanwhile, large vision-language models (VLMs) while robust in zero-shot tasks, fail to capture the subtle difference required for fine-grained classification. In this work, we introduce a novel approach that enhances AI alignment in both zero-shot and few-shot fine-grained image classification by leveraging explainable prompt learning and distribution alignment techniques. Specifically, we utilize pre-trained LLM to expand the label space in a training-free manner, addressing the disparity between plain text and the image-text corpus distributions. This is further enhanced by a few-shot learning pipeline that incorporates prompt learning with a weighted distribution alignment mechanism between image and text representations for better alignment with human-like understanding. The proposed approach not only addresses the limitations of current prompting techniques but also enhances interpretability. Extensive experiments demonstrate the effectiveness of our method and illustrate the interpretability of our descriptions.
Submission Length: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Han-Jia_Ye1
Submission Number: 4217
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview