Keywords: knowledge base construction, named entity recognition and relation extraction, entity linking/disambiguation
TL;DR: A new dataset aligning KG updates with emerging textual knowledge, introducing operations for updating KGs from evolving textual sources.
Abstract: Knowledge Graphs (KGs) are structured knowledge repositories containing entities and relations between them. In this paper, we study the problem of automatically updating KGs over time in response to evolving knowledge in unstructured textual sources. Addressing this problem requires identifying a wide range of update operations based on the state of an existing KG at a given time and the information extracted from text. This contrasts with traditional information extraction pipelines, which extract knowledge from text independently of the current state of a KG. To address this challenge, we propose a method for construction of a dataset consisting of Wikidata KG snapshots over time and Wikipedia passages paired with the corresponding edit operations that they induce in a particular KG snapshot. We obtain these pairs by aligning annotated hyperlinked entity mentions in each Wikipedia passage with the corresponding entities involved in the updated Wikidata triples. We verify, using LLMs with human validation, that these textual passages contain the knowledge needed to support the associated KG edits. The resulting dataset comprises 233K Wikipedia passages aligned with a total of 1.45 million KG edits over 7 different yearly snapshots of Wikidata from 2019 to 2025. Our experimental results highlight key challenges in updating KG snapshots based on emerging textual knowledge, particularly in integrating knowledge expressed in text with the existing KG structure. These findings position the dataset as a valuable benchmark for future research. We will publicly release our dataset and model implementations.
Supplementary Material: zip
Primary Area: datasets and benchmarks
Submission Number: 18427
Loading