A large-scale dataset for korean document-level relation extraction from encyclopedia texts

Published: 01 Jan 2024, Last Modified: 12 May 2025Appl. Intell. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Document-level relation extraction (RE) aims to predict the relational facts between two given entities from a document. Unlike widespread research on document-level RE in English, Korean document-level RE research is still at the very beginning due to the absence of a dataset. To accelerate the studies, we present TREK (Toward Document-Level Relation Extraction in Korean) dataset constructed from Korean encyclopedia documents written by the domain experts. We provide detailed statistical analyses for our large-scale dataset and human evaluation results suggest the assured quality of TREK . Also, we introduce the document-level RE model that considers the named entity-type while considering the Korean language’s properties. In the experiments, we demonstrate that our proposed model outperforms the baselines and conduct qualitative analysis.
Loading