A Parallel Cross-Lingual Benchmark for Multimodal Idiomaticity Understanding

Dilara Torunoglu Selamet; Doğukan Arslan; Rodrigo Wilkens; Wei He; Thomas Pickard; Adriana Silvina Pagano; Aline Villavicencio; Gülşen Eryiğit

A Parallel Cross-Lingual Benchmark for Multimodal Idiomaticity Understanding

Dilara Torunoglu Selamet, Doğukan Arslan, Rodrigo Wilkens, Wei He, Thomas Pickard, Adriana Silvina Pagano, Aline Villavicencio, Gülşen Eryiğit

Published: 27 May 2026, Last Modified: 29 May 2026UniDive 2026EveryoneRevisionsCC BY-SA 4.0

Keywords: Multiword Expressions, Machine Translation, Multilingual models, Multimodal models

Working Group: WG1: Corpus annotation, WG3: Multilingual and cross-lingual language technology, WG4: Quantifying and promoting diversity

WG1 Tasks: Task 1.6: Identification and Annotation of MWES in corpus languages

Abstract: This paper introduces XMPIE which is as a high-quality benchmark designed to bridge the gap in multilingual and multimodal idiom understanding. Potentially idiomatic expressions (PIEs) are highlighted as a significant challenge for NLP because their meanings are rooted in specific language communities and cultural experiences. XMPIE is a parallel dataset covering 34 languages and more than 10,000 items. It enables researchers to analyze idiomatic patterns across languages and evaluate whether a model’s understanding in one language or modality (text) can transfer to another (image). The data was crafted by language experts, with each PIE accompanied by a five-image spectrum ranging from idiomatic to literal meanings, including distractors.

WG3 Tasks: Task 3.5 Evaluation campaign: AdMIRe - Advancing Multimodal Idiomaticity Representation

WG4 Tasks: Task 4.1: Promoting low-resourced/endangered languages

Tracks For Type Of Contribution: Complete work (including previously published work)

Do You Need Visa To Attend The 4th UniDive General Meeting In Romania: Yes

Email Sharing: We authorize the sharing of all author emails with Program Chairs.

Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.

Submission Number: 57

Loading