Split-Wise Evaluation for Turkish Light Verb Construction Detection

Published: 27 May 2026, Last Modified: 29 May 2026UniDive 2026EveryoneRevisionsCC BY-SA 4.0
Keywords: Turkish, light verb constructions, multiword expressions, large language models, prompting
Working Group: WG1: Corpus annotation, WG2: Lexicon-corpus interface, WG3: Multilingual and cross-lingual language technology, WG4: Quantifying and promoting diversity
WG1 Tasks: Task 1.2 on MWE annotation guidelines and UD-PARSEME unification
Abstract: We present a controlled benchmark for Turkish light verb construction detection, comparing supervised Turkish encoders with instruction-tuned large language models under zero-shot, one-shot, and few-shot prompting. Using a balanced diagnostic dataset that contrasts true light verb constructions with literal and random negatives, we show that zero-shot LLMs are highly conservative and miss most positive cases, one-shot prompting often shifts models toward overprediction, and few-shot prompting improves performance but remains strongly model-dependent. In contrast, supervised BERTurk models are more stable across conditions, suggesting that Turkish light verb constructions provide a useful diagnostic for evaluating lexicalized predicate meaning, prompt sensitivity, and model calibration in NLP.
WG3 Tasks: Task 3.4 Evaluation campaign: PARSEME 2.0: a multilingual shared task proposal on identification and paraphrasing of multiword expressions
WG4 Tasks: Task 4.4: Benchmarking for diversity
Tracks For Type Of Contribution: Work in progress
Do You Need Visa To Attend The 4th UniDive General Meeting In Romania: Yes
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 54
Loading