GovRelBench:A Benchmark for Government Domain Relevance

GovRelBench:A Benchmark for Government Domain Relevance

ACL ARR 2025 July Submission934 Authors

29 Jul 2025 (modified: 21 Aug 2025)ACL ARR 2025 July SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Current evaluations of LLMs in the government domain primarily focus on safety considerations in specific scenarios, while the assessment of the models' own core capabilities, particularly domain relevance, remains insufficient. To address this gap, we propose GovRelBench, a benchmark specifically designed for evaluating the core capabilities of LLMs in the government domain. GovRelBench consists of government domain prompts and a dedicated evaluation tool, GovRelBERT. During the training process of GovRelBERT, we introduce the SoftGovScore method: this method trains a model based on the ModernBERT architecture by converting hard labels to soft scores, enabling it to accurately compute the text's government domain relevance score. This work aims to enhance the capability evaluation framework for large models in the government domain, providing an effective tool for relevant research and practice.

Paper Type: Long

Research Area: Resources and Evaluation

Research Area Keywords: benchmarking, evaluation methodologies, NLP datasets, metrics

Contribution Types: NLP engineering experiment, Data resources

Languages Studied: Chinese

Submission Number: 934

Loading