A Semantic Data Parallel Query Method Based on HadoopOpen Website

Published: 2016, Last Modified: 14 May 2023WISE (1) 2016Readers: Everyone
Abstract: To achieve efficient large-scale RDF data queries, we designed a parallel two-phase query strategy-PAQS for large-scale RDF data based on MapReduce, which is divided into two stages: the SPARQL pretreatment stage and the distributed query execution stage. In the SPARQL pretreatment stage, a SPARQL query classification algorithm is implemented, which determines the join order of connection variables by calculating the correlation between the variables in a SPARQL query statement; then, the join between SPARQL clauses is divided into the minimum number of MapReduce jobs according to the connection variables. The distributed query execution phase accomplishes large-scale RDF data query concurrently based on MapReduce jobs from the SPARQL pretreatment stage. The experimental results on the LUMB benchmark set indicate that PAQS can query large-scale RDF data with good efficiency, stability, and scalability.
0 Replies

Loading