A Meta-transfer Learning framework for Visually Grounded Compositional Concept LearningDownload PDF

Anonymous

16 Jan 2022 (modified: 05 May 2023)ACL ARR 2022 January Blind SubmissionReaders: Everyone
Abstract: Humans acquire language in a compositional and grounded manner.They can describe their perceptual world using novel compositions from already learnt elementary concepts. However, recent research shows that modern neural networks lack such compositional generalization ability. To address this challenge, in this paper, we propose \textit{MetaVL}, a meta-transfer learning framework to train transformer-based vision-and-language (V\&L) models using optimization-based meta-learning method and episodic training.We carefully created two datasets based on MSCOCO and Flicker30K to specifically target novel compositional concept learning. Our empirical results have shown that \textit{MetaVL} outperforms baseline models in both datasets. Moreover, \textit{MetaVL} has demonstrated higher sample efficiency compared to supervised learning, especially under the few-shot setting.
Paper Type: long
0 Replies

Loading