Abstract: There are plentiful and diverse applications of graph data management and mining techniques in the real-world scientific research and business activities. As one of the most basic operations, uniform path pattern query processing on graph data faces three big challenges. In this paper, we deal with these challenges by the following points. Firstly, a new query language on graph, called G-Path, is presented, which focuses on complex path pattern query processing on a very large graph. Also, the design of a system called Para-G is proposed, which is based on a BSP-like model as well as MapReduce model, and can effectively handle distributed graph data operations and queries. Secondly, the implementation of Para-G on the de facto cloud platform — Hadoop — is brought forward. Based on the concept of distributed path finite state automaton, the query processing of a G-Path statement in Para-G is detailed. In addition, as the query optimization of G-Path queries, several tricks are utilized to dramatically improve the performance of query execution. Finally, extensive experiments on several graph data sets are conducted to show the usability of the G-Path query language and the effectiveness of Para-G.
Loading