Adaptive Cross-Modal Few-Shot Learning

Chen Xing; Negar Rostamzadeh; Boris N. Oreshkin; Pedro O. Pinheiro

Adaptive Cross-Modal Few-Shot Learning

Chen Xing, Negar Rostamzadeh, Boris N. Oreshkin, Pedro O. Pinheiro

Published: 17 Apr 2019, Last Modified: 23 Mar 2025LLD 2019Readers: Everyone

Keywords: few-shot learning, cross-modality

Abstract: Metric-based meta-learning techniques have successfully been applied to few-shot classification problems. However, leveraging cross-modal information in a few-shot setting has yet to be explored. When the support from visual information is limited in few-shot image classification, semantic representations (learned from unsupervised text corpora) can provide strong prior knowledge and context to help learning. Based on this intuition, we design a model that is able to leverage visual and semantic features in the context of few-shot classification. We propose an adaptive mechanism that is able to effectively combine both modalities conditioned on categories. Through a series of experiments, we show that our method boosts the performance of metric-based approaches by effectively exploiting language structure. Using this extra modality, our model bypass current unimodal state-of-the-art methods by a large margin on miniImageNet. The improvement in performance is particularly large when the number of shots are small.

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/adaptive-cross-modal-few-shot-learning/code)

3 Replies

Loading