Multilingual evaluation of image captioning: How far can we get with CLIP models?

ACL ARR 2024 June Submission2112 Authors

15 Jun 2024 (modified: 02 Aug 2024)ACL ARR 2024 June SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: The evaluation of image captions, looking at both linguistic fluency and semantic correspondence to visual contents, has witnessed a significant effort. Still, despite advancements such as the CLIPScore metric, multilingual captioning evaluation has remained relatively unexplored. This work assesses the use of CLIPScore in multilingual captioning, evaluating different models in a variety of settings. To address the lack of multilingual test data, we consider two different strategies: (1) using machine-translated datasets with human judgements, and (2) re-purposing multilingual datasets that target inference and reasoning. Our results show that multilingual CLIP models can perform on par with their English-centric counterparts on English benchmarks while allowing for multilingual assessments. Performance increases with model finetuning and according to model size. Larger models, trained with more data, attained similar performance to more advanced methods that extended the original CLIPScore. Tests with machine-translated data show that multilingual CLIPScore can also maintain a high correlation with human judgements across different languages, and additional tests with natively multilingual and multicultural data further attest to the high-quality assessments.
Paper Type: Long
Research Area: Multimodality and Language Grounding to Vision, Robotics and Beyond
Research Area Keywords: Multimodality and Language Grounding to Vision, Multilingualism and Cross-Lingual NLP, Resources and Evaluation
Contribution Types: Model analysis & interpretability, Reproduction study, Approaches to low-resource settings, Data resources, Data analysis
Languages Studied: English, German, French, Spanish, Chinese, Portuguese, Italian, Russian, Korean, Dutch, Indonesian, Mandarin, Swahili, Tamil, Turkish, Arabic
Submission Number: 2112
Loading