2021 (modified: 16 Nov 2021)ICLR 2021Readers: Everyone
Abstract:The recent success of Transformers in the language domain has motivated adapting it to a multimodal setting, where a new visual model is trained in tandem with an already pretrained language model....