UrduBench: A Unified Benchmark for Evaluating Large Language Models on Native Urdu Tasks

UrduBench: A Unified Benchmark for Evaluating Large Language Models on Native Urdu Tasks

ACL ARR 2026 January Submission9930 Authors

06 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Urdu NLP; Low-Resource Languages; LLM Benchmarking; Safety and Moderation; Efficient Language Models; Zero-Shot Evaluation

Abstract: Large Language Models (LLMs) have driven rapid advances in natural language processing (NLP); however, low-resource languages such as Urdu, spoken by over 230 million people, remain severely underrepresented, limiting equitable deployment and widening multilingual performance gaps. Existing Urdu benchmarks are fragmented or translation dependent, lacking a unified framework for evaluating emerging efficient models on native, culturally grounded tasks. We present $UrduBench$, a comprehensive benchmark comprising 20 datasets across 17 tasks for Urdu LLM evaluation, covering natural language understanding, safety-critical moderation, and generation. We also release a modular, open-source evaluation framework enabling reproducible zero-shot evaluation with uniform prompting and metrics. Using this framework, we benchmark 13 open-weight instruction-tuned LLMs spanning nano (<1B), small (1–3B), and medium (up to 7B) parameter scales focusing on models that are computationally efficient and suitable for deployment in low-resource settings. Results show pronounced performance disparities across model sizes and task categories, with persistent difficulties in Urdu sequence labeling and generation, and consistent gains from larger multilingual models.

Paper Type: Long

Research Area: Resources and Evaluation

Research Area Keywords: benchmarking,evaluation methodologies,evaluation, automatic evaluation

Contribution Types: Approaches to low-resource settings, Approaches low compute settings-efficiency, Publicly available software and/or pre-trained models

Languages Studied: Urdu

Submission Number: 9930

Loading