Abstract: The rapid use of large language models (LLMs) has raised critical concerns regarding the factual reliability of their outputs, especially in low-resource languages such as Urdu. Existing automated fact-checking solutions overwhelmingly focus on English, leaving a significant gap for the 200+ million Urdu speakers worldwide. In this work, we introduce UrduFactCheck, the first comprehensive, modular fact-checking framework specifically tailored for Urdu. Our system features a dynamic, multi-strategy evidence retrieval pipeline that combines monolingual and translation-based approaches to address the scarcity of high-quality Urdu evidence. We curate and release two new hand-annotated benchmarks: UrduFactBench for claim verification and UrduFactQA for evaluating LLM factuality. Extensive experiments demonstrate that UrduFactCheck, particularly its translation-augmented variants, consistently outperforms baselines and open-source alternatives on multiple metrics. We further benchmark twelve state-of-the-art (SOTA) LLMs on factual question answering in Urdu, highlighting persistent gaps between proprietary and open-source models. UrduFactCheck's code and datasets are open-sourced and publicly available at [URL redacted].
Paper Type: Long
Research Area: Efficient/Low-Resource Methods for NLP
Research Area Keywords: NLP in resource-constrained settings, Benchmarking, Datasets for low-resource languages, Evaluation methodologies, Fact checking, Multilingual QA, Less-resourced languages
Contribution Types: Approaches to low-resource settings, Publicly available software and/or pre-trained models, Data resources
Languages Studied: Urdu
Submission Number: 3281
Loading