Abstract: Trajectory-User Linking (TUL) is a critical task in mobility applications that links unlabeled spatial trajectories to the users or entities that generated them. In these applications, data often arrives as a continuous stream and may experience distributional shifts over time. While adapting TUL models via online learning could address these challenges, this approach remains unexplored in current research. Our work bridges this gap by conducting comprehensive evaluations of common TUL techniques in an online learning context. To improve the performance of existing TUL techniques in this setting, we propose Multi-Level Spatial Embedding Sharing (MiLES), an embedding approach that adapts and extends the principle of multi-scale spatial sharing for online TUL. MiLES partially shares embeddings across neighborhoods of multiple size levels, enabling generalization within neighborhoods while maintaining fine-grained discrimination through more location-specific representations. MiLES also significantly reduces the number of embedding parameters, leading to lower memory usage and more computationally efficient model updates. We further incorporate learnable weighting parameters for each embedding level, allowing the model to learn the influence of different levels during training. Our experimental results on several real-world datasets show that integrating MiLES into state-of-the-art TUL models significantly improves their performance in online learning scenarios, yielding relative gains in top-1 accuracy of up to 24\%, with consistent improvements observed across other training paradigms as well. However, the online gains are particularly relevant, as our findings suggest that online learning is the most suitable paradigm real-time TUL on streaming data, outperforming periodic batch retraining at substantially lower computational cost. To demonstrate its general applicability, we also evaluate MiLES on the task of destination prediction, where it provides consistent performance improvements, confirming its value as a domain-general embedding technique. Our code is available at \url{https://anonymous.4open.science/r/MiLES-3D20}.
Submission Type: Long submission (more than 12 pages of main content)
Previous TMLR Submission Url: https://openreview.net/forum?id=LGflWbxAuP
Changes Since Last Submission: This submission has undergone a revision based on
feedback from a previous review process. The most significant
addition is a new experiment comparing online learning against
periodic batch retraining. A full list of changes follows.
## New Experiments & Analyses
- Comparison of online learning against periodic batch
retraining with intervals of 2,500 and 5,000 trajectories
across all three datasets (see Pages 14 & 15).
Online learning consistently outperforms retraining while
requiring approximately 120x fewer forward and backward passes.
- Rolling top-1 accuracy analysis over the data stream for
Foursquare-NYC and GeoLife with confidence bands, showing
consistent stream-wise advantage of MiLES (Figure 2).
- Batch vs. online comparison with separately tuned
hyperparameters, confirming that MiLES's gains are
consistently larger in the online regime (see Page 15).
## Revised Framing and Positioning
- Revised introduction and related work to explicitly
acknowledge the lineage of grid-based and multi-scale spatial
encoding methods. Added discussion of Space2Vec (Mai et al., 2020) and Fourier features (Tancik et al., 2020). MiLES is
now positioned as adapting and extending multi-scale spatial
sharing for online TUL rather than introducing a fully new
modeling principle.
- Narrowed concept drift claims throughout. The conclusion now
separates the well-supported data-scarcity finding from the
more tentative drift robustness finding.
- Expanded appendix with two new descriptive analyses: KL
divergence of POI distributions on Foursquare-NYC and label
entropy on GeoLife.
## Clarifications and Expanded Discussion
- Added explicit justification of the hyperparameter tuning
protocol as a deliberate design choice reflecting online
deployment constraints.
- Added discussion of pre-allocation vs. dynamic expansion in
the approach section and limitations.
- Expanded limitations section to explain when MiLES is and
is not beneficial, and to discuss alternative retraining
strategies as future work.
- Added end-to-end pipeline description in the method section.
- Replaced all references to "attention mechanism" with
"learnable level weighting" throughout.
Assigned Action Editor: ~Pablo_Samuel_Castro1
Submission Number: 8765
Loading