DReD-A Descriptive Relation Dataset for Expanding Relation Extraction

Logan Markewich, Yubin Xing, Roy Ka-Wei Lee, Zhi Li, Seok-Bum Ko

Published: 01 Jan 2023, Last Modified: 14 Dec 2023IEEE Trans. Artif. Intell. 2023Readers: Everyone

Abstract: Relation extraction is a fundamental topic in document information extraction. Traditionally, datasets for relation extraction have been annotated with named entities and classified with a subset of relation categories. Models then predict either the entities and relations (end-to-end) or assume the entities are given and only classify the relations. However, current approaches are limited by datasets with a narrow definition of entities and relations. We seek to remedy this by introducing our Descriptive Relation Dataset (DReD), which contains 3286 annotations for descriptions of relations between more general noun phrases inspired by linguistic theory. We benchmark our dataset using several seq2seq models and find that T5 achieves the best results with a ROUGE-1 score of 75.5. We verify the usefulness of DreD by collecting feedback on 100 predictions and comparing human judgment to automated scoring methods. Finally, we verify that relations can be described accurately by transforming the CoNLL04 and Re-TACRED datasets and mapping sentence templates to relation categories. T5 achieves competitive accuracy on CoNLL-04 and Re-TACRED with an F1 score of 78.6 and 90.4, respectively. With this article, we prove that relations can be described, therefore overcoming the limitations set by previous datasets and approaches. We publicly provide our dataset and training code at <uri xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">https://github.com/logan-markewich/DReD</uri> . Relation extraction is a powerful task, providing a method to extract labeled connections between words in a document. Existing datasets focus on relations between important named entities, with relations sourced from a list of predefined categories. These categories create limitations for trained models, missing important context that a category name cannot capture alone. Our new Descriptive Relation Dataset, DReD, overcomes these limitations by providing a dataset that allows models to learn how to describe relations in a sentence. DReD contains 3286 annotations of descriptions of relations between general noun phrases, removing the previously stated limitations and providing a way to uncover previously unseen relation types while providing meaningful context. Furthermore, any sequence-to-sequence model can be easily trained on DReD, allowing for flexible and future-proof applications.

0 Replies