Streaming Partitioning of RDF Graphs for Datalog ReasoningDownload PDF

Published: 23 Feb 2021, Last Modified: 05 May 2023ESWC 2021 ResearchReaders: Everyone
Keywords: datalog, materialisation, streaming, partitioning, distributed, reasoning, RDF, knowledge bases, graphs
Abstract: A cluster of servers is often used to reason over RDF graphs whose size exceeds the capacity of a single server. While many distributed approaches to reasoning have been proposed, the problem of data partitioning has received little attention thus far. In practice, data is usually partitioned by a variant of hashing, which is very simple, but it does not pay attention to data locality. Locality-aware partitioning approaches have been considered, but they usually process the entire dataset on a single server. In this paper, we present two new RDF partitioning strategies. Both are inspired by recent \emph{streaming} graph partitioning algorithms \cite{hdrf:2015,2ps}, which partition a graph while keeping only a small subset of the graph in memory. We have evaluated our approaches empirically against hash and min-cut partitioning. Our results suggest that our approaches can significantly improve reasoning performance, but without unrealistic demands on the memory of the servers used for partitioning.
Subtrack: Ontologies and Reasoning
First Author Is Student: Yes
11 Replies