2022 (modified: 03 Nov 2022)ICLR 2022Readers: Everyone
Abstract:Unsupervised large-scale vision-language pre-training has shown promising advances on various downstream tasks. Existing methods often model the cross-modal interaction either via the similarity of...