Data-Efficient and Robust Trajectory Generation through Pathlet Dictionary Learning

Published: 22 Jan 2026, Last Modified: 06 Mar 2026CPAL 2026 (Proceedings Track) OralEveryoneRevisionsBibTeXCC BY 4.0
Keywords: trajectory generative model, dictionary learning, sparse representation
TL;DR: We propose a novel framework combining deep neural network with dictionary learning, using a probabilistic graphical model to enhance robustness and interpretability in trajectory generation and various downstream tasks.
Abstract: Trajectory generation has recently drawn growing interest in privacy-preserving urban mobility studies and location-based service applications. Although many studies have used deep learning or generative AI methods to model trajectories and achieved promising results, real‑world trajectory data are noisy and often incomplete (e.g., device instability, low sampling rates, privacy‑driven partial reporting), introducing distribution shifts and, as observed in our experiments, marked differences between synthetic and real trajectory distributions. To address this issue, we exploit the low-dimensional structure and regular patterns in urban trajectories and propose a parsimonious deep generative model based on sparse pathlet representations, which encode trajectories with sparse binary vectors associated with a learned compact dictionary of trajectory segments. Specifically, we introduce a probabilistic graphical model to describe the trajectory generation process, which includes a Variational Autoencoder (VAE) component and a linear decoder component. During training, the model can simultaneously learn the latent embedding of sparse pathlet representations and the pathlet dictionary that captures essential mobility patterns in the trajectory dataset. The conditional version of our model can also be used to generate customized trajectories based on temporal and spatial constraints. Our model can effectively learn data distribution even using noisy data, achieving relative improvements of 35.4\% and 26.3\% over strong baselines on two real-world trajectory datasets. Moreover, the generated trajectories can be conveniently utilized for multiple downstream tasks, including trajectory prediction and data denoising. Lastly, the framework design offers a significant efficiency advantage, saving 64.8\% of the time and 56.5\% of GPU memory compared to previous approaches. The code repository is available at https://anonymous.4open.science/r/Data-Efficient-and-Robust-Trajectory-Generation-through-Pathlet-Dictionary-Learning-045E.
Submission Number: 84
Loading