Traj2Former: A Local Context-aware Snapshot and Sequential Dual Fusion Transformer for Trajectory Classification
Abstract: The wide use of mobile devices has led to a proliferated creation of extensive trajectory data, rendering trajectory classification increasingly vital and challenging for downstream applications. Existing deep learning methods offer powerful feature extraction capabilities to detect nuanced variances in trajectory classification tasks. However, their effectiveness remains compromised by the following two unsolved challenges. First, identifying the distribution of nearby trajectories based on noisy and sparse GPS coordinates poses a significant challenge, providing critical contextual features to the classification. Second, though efforts have been made to incorporate a shape feature by rendering trajectories into images, they fail to model the local correspondence between GPS points and image pixels. To address these issues, we propose a novel model termed Traj2Former to spotlight the spatial distribution of the adjacent trajectory points (\emph{i.e.}, contextual snapshot) and enhance the snapshot fusion between the trajectory data and the corresponding spatial contexts. We propose a new GPS rendering method to generate contextual snapshots, but it can be applied from a trajectory database to a digital map. Moreover, to capture diverse temporal patterns, we conduct a multi-scale sequential fusion by compressing the trajectory data with differing rates. Extensive experiments have been conducted to verify the superiority of the Traj2Former model.
Primary Subject Area: [Content] Multimodal Fusion
Secondary Subject Area: [Content] Multimodal Fusion
Relevance To Conference: In our study, we developed a multi-scale fusion framework aimed at integrating data from different modalities, specifically sequential trajectory data and image-based global map data. This methodology underscores the importance of effective multimodal data integration, a key aspect of multimedia processing. Trajectory data (time and movement), and information from map sources (semantic and topological information) constitute different data modalities that, when combined, offer a holistic view and local context information.
Supplementary Material: zip
Submission Number: 3486
Loading