MRCEval: A Comprehensive, Challenging and Accessible Machine Reading Comprehension Benchmark

ACL ARR 2025 February Submission5300 Authors

16 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Machine Reading Comprehension (MRC) is an essential task of evaluating natural language understanding. Previous MRC datasets focus on the specific skill of reading comprehension, lacking the requirements of a comprehensive MRC benchmark to assess Large Language Models (LLMs) thoroughly. To fill this gap, we first introduce a novel taxonomy to classify the needed capabilities for RC, then based on the taxonomy, we automatically build an MRC benchmark MRCEval, which employs powerful LLMs as sample generators and selection judges. MRCEval is a comprehensive, challenging and accessible benchmark, which consists of three main tasks and 13 sub-tasks with a total of 2.2K high-quality multi-choice questions. We perform an extensive evaluation of 28 widely used open-source and proprietary models, highlighting that MRC continues to present significant challenges even in the era of LLMs.
Paper Type: Short
Research Area: Resources and Evaluation
Research Area Keywords: reading comprehension, benchmarking, language resources
Contribution Types: Data resources
Languages Studied: English
Submission Number: 5300
Loading