Keywords: idiom translation, low-resource MT, automatic evaluation, multilingual corpora, corpus creation, Urdu-English
Abstract: We present a comprehensive evaluation of Urdu--English idiomatic translation, introducing a parallel benchmark encompassing Native and Roman Urdu across translation, paraphrasing, idiom span detection, and idiomatic back-translation. Eight prompting strategies—including literal, cultural, idiomatic, and few-shot prompts—are used to assess multiple open-source LLMs and NMT systems. Performance is evaluated using BLEU, ChrF, BERTScore, COMET, XCOMET, ROUGE, Levenshtein distance, and multilingual embedding cosine similarities (LASER, LaBSE, USE). Results indicate that LLMs outperform NMT systems in preserving idiomatic meaning, with cultural and idiomatic prompts yielding the highest semantic fidelity. Few-shot prompting further improves idiom handling. Native Urdu consistently achieves higher scores than Roman Urdu across all tasks, highlighting the influence of script on translation quality. This study provides the first multi-metric, cross-script benchmark for idiomatic Urdu--English translation, offering insights into model behavior, prompt sensitivity, and the challenges of Romanized input.
Paper Type: Long
Research Area: Multilinguality and Language Diversity
Research Area Keywords: multilingual benchmarks, multilingual evaluation, less-resourced languages, corpus creation, benchmarking, datasets for low resource languages, few-shot/zero-shot MT, multi-word expressions
Contribution Types: Model analysis & interpretability, Approaches to low-resource settings, Publicly available software and/or pre-trained models, Data resources
Languages Studied: Urdu, English
Submission Number: 1611
Loading