A Zero-Shot LLM Pipeline for Multimodal Idiom Understanding and Ranking

Atakan Site; Oğuz Ali Arslan; Gülşen Eryiğit

A Zero-Shot LLM Pipeline for Multimodal Idiom Understanding and Ranking

Atakan Site, Oğuz Ali Arslan, Gülşen Eryiğit

Published: 27 May 2026, Last Modified: 27 May 2026UniDive 2026EveryoneRevisionsCC BY-SA 4.0

Keywords: Multimodal Idiom Understanding, Potentially Idiomatic Expressions (PIEs), Zero-Shot Learning, Large Vision-Language Models (LVLMs), Chain-of-Thought Prompting

Working Group: WG3: Multilingual and cross-lingual language technology

Abstract: This paper presents our system for AdMIRe 2 (Advancing Multimodal Idiomaticity Repre- sentation), a shared task on multilingual multi- modal idiom understanding. The task focuses on ranking images according to how well they depict the literal or idiomatic usage of poten- tially idiomatic expressions (PIEs) in context, across 15 languages and two tracks: a text-only track, and a multimodal track that uses both images and captions. To tackle both tracks, we propose a hybrid zero-shot pipeline built on large vision–language models (LVLMs). Our system employs a chain-of-thought prompting scheme that first classifies each PIE usage as literal or idiomatic and then ranks candidate images by their alignment with the inferred meaning. A primary–fallback routing mech- anism increases robustness to safety-filter re- fusals, while lightweight post-processing recov- ers consistent rankings from imperfect model outputs. Without any task-specific fine-tuning, our approach achieves 55.9% Top-1 Accuracy in the text-only track and 60.1% in the multi- modal (text+image) track, ranking first overall on the official leaderboard. These results sug- gest that carefully designed zero-shot LVLM pipelines can provide strong baselines for mul- tilingual multimodal idiomaticity benchmarks.

WG3 Tasks: Task 3.5 Evaluation campaign: AdMIRe - Advancing Multimodal Idiomaticity Representation

Tracks For Type Of Contribution: Complete work (including previously published work)

Do You Need Visa To Attend The 4th UniDive General Meeting In Romania: Yes

Email Sharing: We authorize the sharing of all author emails with Program Chairs.

Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.

Submission Number: 60

Loading