Abstract: With the rapid growth of semantic data, scalable reasoning has attracted more, more attention. However, most existing works about scalable reasoning focus only on RDFS/OWL ter Horst semantics, which are small fragments of OWL 2 RL,, have limitation in expressivity. As OWL 2 RL semantics extended with SWRL rules can be expressed by datalog language, materialization of datalog programs is widely adopted in traditional reasoners. In this paper, we propose a dependency-aware approach on parallel materialization of datalog programs for scalable reasoning. We first present an algorithm to automate the translation from a Datalog rule execution into MapReduce jobs, make several optimizations for the algorithm to speed up the rule evaluation process. Since the rule execution order has significant impact on reasoning performance due to the dependencies among rules. We then propose a sampling-based method to capture rule dependency,, design a dependency-aware strategy to schedule rule evaluation. Finally, we establish a system to evaluate the proposed approach with a series of semantic rule sets on large synthetic, real knowledge bases. The experimental results show that the proposed optimizations have significant effectiveness, our system achieves approximately linear scalability.
Loading