Crossmodal-3600: A Massively Multilingual Multimodal Evaluation DatasetDownload PDF

Anonymous

16 Jan 2022 (modified: 05 May 2023)ACL ARR 2022 January Blind SubmissionReaders: Everyone
Abstract: Research in massively multilingual image captioning has been severely hampered by a lack of high-quality evaluation datasets. In this paper we present and make available the Crossmodal-3600 dataset, a geographically diverse set of 3600 images each of them annotated with human-generated reference captions in 36 languages. We select a representative set of images from across the world for this dataset, and annotate it with captions that achieve consistency in terms of style across all languages, while avoiding annotation artifacts due to direct translation. We apply this benchmark to model selection for massively multilingual image captioning models, and show superior correlation results with human evaluations when using the Crossmodal-3600 dataset as golden references for automatic metrics.
Paper Type: long
0 Replies

Loading