CVLUE: A New Benchmark Dataset for Chinese Vision-Language Understanding EvaluationDownload PDF

Anonymous

16 Feb 2024ACL ARR 2024 February Blind SubmissionReaders: Everyone
Abstract: The abundance of vision-language (VL) understanding benchmark datasets for English, such as MS-COCO and Flickr30K, has largely facilitated the evaluation of new vision-language models (VLMs) across diverse tasks. However, despite the rapid development of Chinese VLMs, most existing Chinese VL datasets are constructed by re-annotating the images from English VL datasets, limiting the source of images to English-speaking cultures only. Some others are limited to a few fundamental tasks, like image-text retrieval. Such cultural bias and limitation of task types make these datasets unsuitable and inadequate for evaluating VLMs in Chinese culture. To remedy this issue, we present a new Chinese Vision-Language Understanding Evaluation (CVLUE) benchmark dataset, where the selection of object categories and images is entirely driven by Chinese native speakers, ensuring that the source images are representative of Chinese culture. The benchmark contains four distinct VL tasks ranging from image-text retrieval to visual question answering, visual grounding and visual dialogue, which evaluates a model's VL capability from multiple aspects. We present a detailed statistical analysis of CVLUE and provide a baseline performance analysis with several open-source multilingual VLMs on CVLUE and its English counterparts to reveal their performance gap between English and Chinese.
Paper Type: long
Research Area: Multimodality and Language Grounding to Vision, Robotics and Beyond
Contribution Types: Data resources, Data analysis
Languages Studied: English, Chinese
Preprint Status: There is no non-anonymous preprint and we do not intend to release one.
A1: yes
A1 Elaboration For Yes Or No: 8
A2: yes
A2 Elaboration For Yes Or No: 7
A3: yes
A3 Elaboration For Yes Or No: 1
B: yes
B1: yes
B1 Elaboration For Yes Or No: 5
B2: yes
B2 Elaboration For Yes Or No: 7
B3: yes
B3 Elaboration For Yes Or No: 5
B4: yes
B4 Elaboration For Yes Or No: 7
B5: yes
B5 Elaboration For Yes Or No: 4
B6: yes
B6 Elaboration For Yes Or No: 3
C: yes
C1: yes
C1 Elaboration For Yes Or No: 5
C2: yes
C2 Elaboration For Yes Or No: 5
C3: yes
C3 Elaboration For Yes Or No: 5
C4: n/a
D: yes
D1: yes
D1 Elaboration For Yes Or No: 3
D2: yes
D2 Elaboration For Yes Or No: 7
D3: yes
D3 Elaboration For Yes Or No: 7
D4: n/a
D5: yes
D5 Elaboration For Yes Or No: 3
E: no
E1: n/a
0 Replies

Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview