Abstract: The rapid increase of GPS-enabled devices has led to immense amounts of trajectory data being collected and analyzed. To provide insight into these datasets, a number of spatio-temporal queries need to be executed efficiently and at scale. One such important query is the Query by Path, which given a series of road segments and a time interval, retrieves all trajectories that have passed through the road segments within a given time interval. The Query by Path finds application in many areas, including traffic management, transportation planning and fleet monitoring. In this paper we develop an approach to partition and distribute trajectories across a cluster and execute queries by path at scale. At the center of our approach is the partitioning of the entire dataset and indexing each partition with a Trie. We develop a basic set of partitioning approaches and show that each can be rendered inefficient by skew in the dataset. We consequently propose a HYbrid PartitiOning algorithm (HYPO) that performs robustly in face of skew. We also provide the cost models to configure HYPO. Finally we assess its performance extensively using both real and synthetic datasets to demonstrate that it scales well in face of skew.
0 Replies
Loading