Experiences of Using WDumper to Create Topical Subsets from Wikidata

Mar 12, 2021 (edited Apr 23, 2021)ESWC 2021 Workshop KGCW SubmissionReaders: Everyone
  • Keywords: wikidata, knowledge graph subsetting, topical subset, wdumper
  • TL;DR: In this paper, for the first time, we build and evaluate some practical subsets from Wikidata using the WDumper tool.
  • Abstract: Wikidata is a general-purpose knowledge graph covering a wide variety of topics with content being crowd-sourced through an open wiki. There are now over 90M interrelated data items in Wikidata which are accessible through a public query endpoint and data dumps. However, execution timeout limits and the size of data dumps make it difficult to use the data. The creation of arbitrary topical subsets of Wikidata, where only the relevant data is kept, would enable reuse of that data with the benefits of cost reduction, ease of access, and flexibility. In this paper, we provide a formal definition of topical subsets over the Wikidata Knowledge Graph and evaluate a third-party tool (WDumper) to extract these topical subsets from Wikidata.
