Published: 2022, Last Modified: 04 May 2023ICLR 2022Readers: Everyone
Abstract:Unsupervised large-scale vision-language pre-training has shown promising advances on various downstream tasks. Existing methods often model the cross-modal interaction either via the similarity of...