Open Peer Review. Open Publishing. Open Access. Open Discussion. Open Directory. Open Recommendations. Open API. Open Source.
Multi-Modal Few-Shot Learning: A Benchmark
Frederik Pahde, Moin Nabi, Tassilo Klein
Feb 12, 2018 (modified: Feb 12, 2018)ICLR 2018 Workshop Submissionreaders: everyone
Abstract:The state-of-the-art deep learning algorithms generally require large amounts of data for model training. Lack thereof can severely deteriorate the performance. To this end, we propose a multi-modal approach that facilitates bridging the information gap by means of meaningful joint embeddings. Specifically, we present a benchmark that is multi-modal during training (i.e. images and texts) and single-modal in testing time (i.e. images), with the associated task to utilize multi-modal data in base classes (with many samples), to learn explicit visual classifiers for novel classes (with few samples). Next, we propose an framework built upon the idea of cross-modal data hallucination. In this regard, we introduce a discriminative text-conditional GAN for sample generation with a simple self-paced strategy for sample selection. Experiments on our proposed benchmark demonstrate that learning generative models in a cross-modal fashion facilitates few-shot learning by compensating the lack of data in novel categories.
TL;DR:We propose a benchmark for few-shot learning in multi-modal scenarios in conjunction with an approach including a discriminative text-conditional GAN for cross-modal sample generation with a simple self-paced strategy for sample selection.