Experiences of Using WDumper to Create Topical Subsets from WikidataDownload PDF

Published: 23 Apr 2021, Last Modified: 05 May 2023KGCW 2021Readers: Everyone
Keywords: wikidata, knowledge graph subsetting, topical subset, wdumper
TL;DR: In this paper, for the first time, we build and evaluate some practical subsets from Wikidata using the WDumper tool.
Abstract: Wikidata is a general-purpose knowledge graph covering a wide variety of topics with content being crowd-sourced through an open wiki. There are now over 90M interrelated data items in Wikidata which are accessible through a public query endpoint and data dumps. However, execution timeout limits and the size of data dumps make it difficult to use the data. The creation of arbitrary topical subsets of Wikidata, where only the relevant data is kept, would enable reuse of that data with the benefits of cost reduction, ease of access, and flexibility. In this paper, we provide a formal definition of topical subsets over the Wikidata Knowledge Graph and evaluate a third-party tool (WDumper) to extract these topical subsets from Wikidata.
5 Replies

Loading