Effective, Efficient, and Generalizable Algorithms for Trajectory Similarity Queries

Yanchuan Chang

Published: 2023, Last Modified: 29 Jul 2024undefined 2023EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Trajectory data is becoming increasingly accessible due to the prevalence of GPS-equipped devices, such as smart phones and vehicles. This type of data contains rich location and movement information of people which enables a wide range of location-based applications, such as carpooling, contact tracing, urban planning, traffic analysis and location-based recommendation. How to retrieve trajectories from a large volume of trajectory data effectively and efficiently has become an important area of research, which has attracted extensive interests from both the academia and the industry. Existing trajectory query algorithms struggle to meet the requirements of emerging applications in both effectiveness and efficiency, particularly as the trajectories grow in length. These limitations underscore the pressing need for novel trajectory query algorithms. This thesis addresses the need with a focus on trajectory similarity queries. The thesis studies four problems related to trajectory similarity queries. The first problem, sub-trajectory similarity join, is a new type of trajectory similarity queries. While most existing studies focus on querying trajectories that are similar to each other in their entirety, we propose a new trajectory similarity measure that focuses on the partial similarity of trajectories. We measure the length of the time duration that two trajectories are close (i.e., within a certain distance in space). We define the sub-trajectory similarity join query based on this measure, which returns pairs of trajectories satisfying a sub-trajectory similarity threshold. Such queries target contact tracing and carpooling applications. We present a client-server-based distributed index structure and a query algorithm with an efficient backtracking technique for the join query. Theoretical analysis and experiments on real data confirm the effectiveness and the efficiency of our proposed index structure and query algorithm. The second problem, road network representation learning, concerns trajectory queries on road networks. Our aim is to learn a task-agnostic road network representation that can be applied to different trajectory queries, thus avoiding storing multiple task-specific road network representations and improving the storage space efficiency of trajectory databases. Existing road network representation learning approaches cannot satisfy this goal, since most approaches are supervised learning-based that learn task-specific road network representations. Further, these approaches exploit generic graph neural networks methods to learn graph representations based on topological features while ignoring the spatial features of road networks, which are important to trajectory applications. To address these issues, we propose a self-supervised contrastive learning method to learn generic and task-agnostic road network representations. We devise four novel modules to learn spatial features and spatial correlations of road networks. Once trained, the road network representation can be directly applied to different trajectory queries without any fine-tuning. Experimental results on different trajectory queries, such as trajectory similarity measurement and shortest-path-based trajectory route planning, show that the proposed model outperforms state-of-the-art self-supervised models consistently and even achieves comparable performance to the supervised models. The third problem, trajectory similarity learning, concerns trajectory representation for trajectory similarity measurement in Euclidean space. Our aim is to learn a trajectory representation that enables effective and efficient similarity evaluation between two given trajectories, which is a core operator in trajectory query processing. Motivated by the strong representation learning capability of contrastive learning, we again propose a contrastive learning-based method. We design a novel dual-feature self-attention-based trajectory backbone encoder and four trajectory dedicated augmentation methods for trajectory representation learning, to encode both the coarse-grained and the fine-grained spatial properties of trajectories into the learned representations. Once trained, the backbone encoder can be used on its own for trajectory representation computation and similarity estimation. It can also be fine-tuned to compute an approximation of traditional heuristic trajectory similarity measures such as the Frechet measure. Experimental results show that our proposed approach produces trajectory representations that lead to consistently more accurate trajectory similarity measures than those of the state-of-the-art approaches. The forth problem concerns an in-depth analysis on existing trajectory similarity measures. Our aim is to provide a comprehensive comparison of the heuristic trajectory similarity measures and the deep learning-based ones from an efficiency perspective, and to analyze their strengths, limitations, and applicable scenarios. Recent studies on the learned trajectory similarity measures focus on how to accurately approximate heuristic trajectory similarity measures. They have largely omitted the efficiency considerations. We implement deep learning-based and heuristic approaches on both CPU and GPU for a fair efficiency comparison. Experimental results show that, heuristic approaches run faster than deep learning-based approaches when measuring the similarity between two trajectories on both CPU and GPU without any pre-computation. Once trajectory embeddings are given and can be reused, some deep learning-based approaches can achieve better computational efficiency than the heuristic ones. We also conduct experiments on kNN queries by using the dedicated index structures, where the deep learning-based approaches consistently outperform the heuristic ones on efficiency. This study shows clearly which class of method should be applied for what purpose and given a set of experimental circumstances.