Relation Extraction for Constructing Knowledge Graphs: Enhancing the Searchability of Community-Generated Digital Content (CGDC) Collections

Martin Marinov; Youcef Benkhedda; Ewan D. Hannaford; Marc Alexander; Goran Nenadic; Riza Batista-Navarro

Relation Extraction for Constructing Knowledge Graphs: Enhancing the Searchability of Community-Generated Digital Content (CGDC) Collections

Martin Marinov, Youcef Benkhedda, Ewan D. Hannaford, Marc Alexander, Goran Nenadic, Riza Batista-Navarro

Published: 09 Jul 2024, Last Modified: 15 Jul 2024DL4KG 2024EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Relation Extraction, Zero-shot Prompting, Transformer Models, Knowledge Graphs, Cultural Heritage

TL;DR: Casting relation extraction as an NLI task, we employed BART, DeBERTa and T5 models in a zero-shot manner to extract entity relations which are then used to curate a cultural heritage-focussed knowledge graph.

Abstract: Much of people's understanding of their cultural heritage is facilitated by the curation and preservation of community-generated digital content (CGDC): archival collections that were created for, with and by local communities. However, communities employ their own conventions in storing and publishing their content. Given this and the fact that semantic information tends to be buried within textual descriptions, CGDC archives are currently siloed and obscured, thus making it difficult for end-users (e.g., members of the public and researchers) to search for fine-grained information (e.g., "Where did Alfred Edward Julian work?"). In this paper, we propose to represent the information within CGDC archives in the form of knowledge graphs. To enable the construction of such knowledge graphs at scale, we developed a zero-shot approach for relation extraction, which we cast as a natural language inference (NLI) problem. Specifically, for each of the 20 relation types drawn from Wikidata that we have identified as relevant to CGDC, we created a premise-hypothesis pair that is presented to an NLI model that determines whether entailment (and thus the relation type) holds. The premise is a sentence from the natural language description and the hypothesis is automatically generated using a template based on each of the relation types. We present the results of comparing and combining three different transformer-based models that were already fine-tuned for the NLI task, namely, DeBERTa, BART and T5.

Submission Number: 3

Loading