ZERO: A Large-scale Chinese Cross-modal Benchmark with a New Vision-Language FrameworkDownload PDF

22 Sept 2022 (modified: 13 Feb 2023)ICLR 2023 Conference Withdrawn SubmissionReaders: Everyone
Abstract: Vision-language pre-training (VLP) on large-scale datasets has shown premier performance on various downstream tasks. In contrast to plenty of available benchmarks with English corpus, large-scale pre-training and downstream datasets with Chinese corpus remain largely unexplored. In this paper, we build a large-scale Chinese cross-modal benchmark from ZERO, which is named for our database publicly available for the research community to build VLP models. We release a pre-training dataset and five fine-tuning datasets for downstream tasks, and also develop a pre-training framework of pre-Ranking + Ranking with target-guided Distillation and feature-guided Distillation (R2D2) for cross-modal learning. In specific, a global contrastive pre-ranking is introduced to learn the individual representations of images and texts. We then fuse the representations in a fine-grained ranking manner via an image-text cross encoder and a text-image cross encoder. To further enhance the capability of our method, a two-way distillation strategy is used with target-guided distillation and feature-guided distillation. We achieve state-of-the-art performance on eleven downstream datasets from four broad categories of tasks including image-text retrieval, image-text matching, image caption, and text-to-image generation.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Submission Guidelines: Yes
Supplementary Material: zip
Please Choose The Closest Area That Your Submission Falls Into: Infrastructure (eg, datasets, competitions, implementations, libraries)
10 Replies

Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview