Urdu-GLUE: A Comprehensive Benchmark and Dynamic Prompt-Based Fine-Tuning for Urdu Language Understanding
Keywords: Urdu-GLUE, Low-Resource Languages, Natural Language Understanding, Prompt-Based Fine-Tuning, ADAPT, Urdu NLP
Abstract: Language understanding benchmarks have driven significant progress in Natural Language Processing (NLP). However, most benchmarks focus on high-resource languages such as English, therefore leaving low-resource languages underserved. Despite being spoken by over 246 million people worldwide, Urdu lacks comprehensive evaluation resources. To address this gap, we introduce Urdu-GLUE, the first comprehensive benchmark for Urdu language understanding. Our comprehensive benchmark comprises ten diverse tasks, including single-sentence classification, similarity and paraphrase detection, natural language inference, question answering, and sequence labeling. To cover all the tasks mentioned in the benchmark, we created four new datasets: (1) U-CoLA for grammatical acceptability, (2) U-WNLI for Winograd-style coreference, (3) U-STS-B for semantic similarity, and (4) U-XNLI, a preprocessed XNLI dataset. To ensure quality, three native Urdu speakers fluent in English manually verified each dataset. To address the low-resource status of the Urdu language, we also introduced ADAPT (Adaptive Dynamic Prompt Template), the first dynamic prompt-based fine-tuning strategy for encoder-based models. ADAPT systematically explores various prompt templates during training and automatically identifies the most effective for inference. We evaluated multiple fine-tuning (FT) strategies, including standard FT, prompt-based FT, LoRA, QLoRA, and ADAPT, across three experimental settings, i.e., zero-shot, 16-shot, and 80/20 split. Our experiments demonstrate that prompt-based FT methods consistently outperform standard FT in few-shot settings. Our findings provide practical insights for low-resource NLP research. To facilitate future work, we publicly\footnote{https://anonymous.4open.science/r/Urdu-Glue-7D78/README.md} release all datasets, and code.
Paper Type: Long
Research Area: Low-resource Methods for NLP
Research Area Keywords: benchmarking, multilingual benchmarks, less-resourced languages, NLP datasets, evaluation methodologies, few-shot learning, data-efficient training
Contribution Types: Approaches to low-resource settings, Data resources
Languages Studied: Urdu
Submission Number: 5359
Loading