How and Why is An Answer (Still) Correct? Maintaining Provenance in Dynamic Knowledge Graphs

Garima Gaur, Arnab Bhattacharya, Srikanta Bedathur

2020 (modified: 23 Dec 2022)CIKM 2020Readers: Everyone

Abstract: Knowledge graphs (KGs), that have become the backbone of many critical knowledge-centric applications, are mostly automatically constructed based on an ensemble of extraction techniques applied over diverse data sources. It is, therefore, important to establish the provenance of results for a query to determine how these were computed. Provenance is shown to be useful for assigning confidence scores to the results, for debugging the KG generation itself, and for providing answer explanations. In many such applications, certain queries are registered as standing queries since their answers are needed often. However, KGs keep continuously changing due to reasons such as changes in the source data, improvements to the extraction techniques, refinement/enrichment of information, and so on. This raises the issue of efficiently maintaining the provenance polynomials of complex graph pattern queries for dynamic and large KGs instead of having to recompute them from scratch each time the KG is updated. Addressing this issue, we present a framework HUKA that uses provenance polynomials for tracking the derivation of query results over knowledge graphs by encoding the edges involved in generating the answer. More importantly, HUKA also maintains these provenance polynomials in the face of updates---insertions as well as deletions of facts---in the underlying KG. Experimental results over large real-world KGs such as YAGO and DBpedia with various benchmark SPARQL query workloads reveals that HUKA can be almost 50 times faster than existing systems for provenance computation on dynamic KGs.

0 Replies