Abstract: In few-shot learning tasks, a series of semantic-based methods have shown excellent performance due to the modality fusion of both visual and semantic modalities. However, in single-shot learning tasks, the fused visual modality fails to comprehensively capture the class information since only one image is available. To address this issue, we propose a semantic-based single-shot method which considers from both local and global perspectives. Specifically, we fully exploit local visual features to replace the traditional image-level features in the modality fusion in those semantic-based methods. Moreover, a global classification loss is introduced to enlarge the encoding space for accurate and distinguishable local embeddings. Through a series of experiments, we show that by exploiting local features from a global classification perspective, our model boosts the performance of semantic-based approaches by a large margin on two different data sets and global classification loss is effective on both metrics.
Loading