Keywords: Semantic Web, Federated Query Processing, Source Selection, SPARQL
Abstract: Processing SPARQL queries over large federations of SPARQL endpoints
is crucial for keeping the Semantic Web decentralized. Despite the
existence of hundreds of SPARQL endpoints, current federation
engines only scale to dozens.
One major issue comes from the current definition of the source
selection problem, i.e., finding the minimal set of SPARQL endpoints
to contact per triple pattern. Even if such a source selection is
minimal, only a few combinations of sources may return results.
Consequently, most of the query processing time is wasted evaluating
combinations that return no results.
In this paper, we introduce the concept of Result-Aware query plans. This concept ensures that every subquery of the query plan effectively contributes to the result of the query.
To compute a Result-Aware query plan, we propose FedUP, a new federation
engine able to produce Result-Aware query plans by tracking the
provenance of query results.
However, getting query results requires computing source selection,
and computing source selection requires query results. To break this
vicious cycle, FedUP computes results and provenances on tiny quotient
summaries of federations at the cost of source selection accuracy.
Experimental results on federated benchmarks demonstrate that FedUP
outperforms state-of-the-art federation engines by orders of
magnitude in the context of large-scale federations.
Track: Semantics and Knowledge
Submission Guidelines Scope: Yes
Submission Guidelines Blind: Yes
Submission Guidelines Format: Yes
Submission Guidelines Limit: Yes
Submission Guidelines Authorship: Yes
Student Author: Yes
Submission Number: 2408
Loading