Published: 01 Jan 2023, Last Modified: 10 Nov 2023ICML 2023Readers: Everyone
Abstract:Vision-language representation learning models (e.g., CLIP) have achieved state-of-the-art performance on various downstream tasks, which usually need large-scale training data to learn discriminat...