DocEE-zh: A Fine-grained Benchmark for Chinese Document-level Event Extraction

ACL ARR 2024 June Submission3969 Authors

16 Jun 2024 (modified: 08 Aug 2024)ACL ARR 2024 June SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Event extraction aims to identify events and then extract the arguments involved in those events. In recent years, there has been a gradual shift from sentence-level event extraction to document-level event extraction research. Despite the significant success achieved in English domain event extraction research, event extraction in Chinese still remains largely unexplored. However, a major obstacle to promoting Chinese document-level event extraction is the lack of fine-grained, wide domain coverage datasets for model training and evaluation. In this paper, we propose DocEE-zh, a new Chinese document-level event extraction dataset comprising over 36,000 events and more than 210,000 arguments. DocEE-zh is an extension of the DocEE dataset, utilizing the same event schema, and all data has been meticulously annotated by human experts. We highlight two features: focus on high-interest event types and fine-grained argument types. Experimental results indicate that state-of-the-art models still fail to achieve satisfactory performance (F1 score of 68\%), revealing that Chinese DocEE remains an unresolved challenge.
Paper Type: Long
Research Area: Resources and Evaluation
Research Area Keywords: Information Extraction,Resources and Evaluation
Contribution Types: Data resources
Languages Studied: Chinese
Submission Number: 3969
Loading