Rethinking Semantic Few-Shot Image Classification

22 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: representation learning for computer vision, audio, language, and other modalities
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: few-shot image classification, contrastive learning
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Abstract: Few-shot learning aims to train models that can be generalized to novel classes with only a few samples. Recently, a line of works has been proposed to enhance few-shot learning with semantic information from class names. However, these works focus on injecting semantic information into existing modules such as visual prototypes and feature extractors of the standard few-shot learning framework, which requires complex designs of the fusion mechanism. In this paper, we propose a novel few-shot learning framework that uses public textual encoders based on contrastive learning. To address the challenge of alignment between visual features and textual embeddings obtained from public textual encoders, we carefully design the textual branch of our framework and introduce a metric module to generalize the cosine similarity. For better transferability, we let the metric module adapt to different few-shot tasks and adopt MAML to train the model via bi-level optimization. Moreover, we conduct extensive experiments on multiple benchmarks to demonstrate the effectiveness of our method.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
Supplementary Material: zip
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 5523
Loading