Keywords: RML, SPARQL, Big data, Semantic Query Optimization, Knowledge Graph Construction, RDF
TL;DR: Convert RML to SPARQL (with extensions); execute SPARQL with Apache Spark
Abstract: Approaches for the construction of knowledge graphs from heterogeneous data sources range from
ad-hoc scripts to dedicated mapping languages. Two common foundations are thereby RML and SPARQL.
So far, both approaches are treated as different: On the one hand there are tools specifically for processing
RML whereas on the other hand there are tools that extend SPARQL in order to incorporate additional
data sources. In this work, we first show how this gap can be bridged by translating RML to a sequence
of SPARQL CONSTRUCT queries and introduce the necessary SPARQL extensions. In a subsequent step,
we employ techniques to optimize SPARQL query workloads as well as individual query execution times
in order to obtain an optimized sequence of queries w.r.t. order and uniqueness of the generated triples.
Finally, we present a corresponding SPARQL query execution engine based on the Apache Spark Big
Data framework. In our evaluation on benchmarks we show that our approach is capable of achieving
RML mapping execution performance that surpasses the current state of the art.
1 Reply
Loading