Consolidating and Developing Benchmarking Datasets for the Nepali Natural Language Understanding Tasks

Consolidating and Developing Benchmarking Datasets for the Nepali Natural Language Understanding Tasks

ACL ARR 2025 July Submission374 Authors

27 Jul 2025 (modified: 31 Aug 2025)ACL ARR 2025 July SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: The Nepali language has distinct linguistic features, especially its complex script (Devanagari script), morphology, and various dialects, which pose a unique challenge for Natural Language Understanding (NLU) tasks. While the Nepali Language Understanding Evaluation (Nep-gLUE) benchmark provides a foundation for evaluating models, it remains limited in scope, covering four tasks. This restricts their utility for comprehensive assessments of Natural Language Processing (NLP) models. To address this limitation, we introduce twelve new datasets, creating a new benchmark, the Nepali Language Understanding Evaluation (NLUE) benchmark for evaluating the performance of models across a diverse set of Natural Language Understanding (NLU) tasks. The added tasks include Single-Sentence Classification, Similarity and Paraphrase Tasks, Natural Language Inference (NLI), and General Masked Evaluation Task (GMET). Through extensive experiments, we demonstrate that existing top models struggle with the added complexity of these tasks. We also find that the best multilingual model outperforms the best monolingual models across most tasks, highlighting the need for more robust solutions tailored to the Nepali language. This expanded benchmark sets a new standard for evaluating, comparing, and advancing models, contributing significantly to the broader goal of advancing NLP research for low-resource languages.

Paper Type: Long

Research Area: Resources and Evaluation

Research Area Keywords: NLP, Benchmarking, Evaluation, NLU, Nepali

Contribution Types: Approaches to low-resource settings, Data resources

Languages Studied: Nepali

Previous URL: https://openreview.net/forum?id=d5RKVv8EEy

Explanation Of Revisions PDF: pdf

Reassignment Request Area Chair: No, I want the same area chair from our previous submission (subject to their availability).

Reassignment Request Reviewers: No, I want the same set of reviewers from our previous submission (subject to their availability)

Software: zip

Data: zip

A1 Limitations Section: This paper has a limitations section.

A2 Potential Risks: N/A

B Use Or Create Scientific Artifacts: Yes

B1 Cite Creators Of Artifacts: Yes

B1 Elaboration: 1,2,4

B2 Discuss The License For Artifacts: N/A

B3 Artifact Use Consistent With Intended Use: Yes

B3 Elaboration: 4, A, B

B4 Data Contains Personally Identifying Info Or Offensive Content: N/A

B5 Documentation Of Artifacts: Yes

B5 Elaboration: 4, A

B6 Statistics For Data: Yes

B6 Elaboration: 1, 4

C Computational Experiments: Yes

C1 Model Size And Budget: Yes

C1 Elaboration: 4, 5, D, E

C2 Experimental Setup And Hyperparameters: Yes

C2 Elaboration: 5, E

C3 Descriptive Statistics: Yes

C3 Elaboration: 4, E

C4 Parameters For Packages: N/A

D Human Subjects Including Annotators: No

D1 Instructions Given To Participants: N/A

D2 Recruitment And Payment: N/A

D3 Data Consent: N/A

D4 Ethics Review Board Approval: N/A

D5 Characteristics Of Annotators: N/A

E Ai Assistants In Research Or Writing: Yes

E1 Information About Use Of Ai Assistants: Yes

E1 Elaboration: 1, 4, 8, B, C

Author Submission Checklist: yes

Submission Number: 374

Loading