HiT-JEPA: A Hierarchical Self-supervised Trajectory Embedding Framework for Similarity Computation

Lihuan Li; Hao Xue; Shuang Ao; Yang Song; Flora D. Salim

HiT-JEPA: A Hierarchical Self-supervised Trajectory Embedding Framework for Similarity Computation

Lihuan Li, Hao Xue, Shuang Ao, Yang Song, Flora D. Salim

20 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: urban trajectory representation learning, trajectory similarity computation, hierarchical self-supervised learning, transformers, joint embedding predictive architecture

Abstract: The representation of urban trajectory data plays a critical role in effectively analyzing spatial movement patterns. Despite considerable progress, the challenge of designing trajectory representations that can capture diverse and complementary information remains an open research problem. Existing methods struggle in incorporating trajectory fine-grained details and high-level summary in a single model, limiting their ability to attend to both long-term dependencies while preserving local nuances. To address this, we propose HiT-JEPA (Hierarchical Interactions of Trajectory Semantics via a Joint Embedding Predictive Architecture), a unified framework for learning multi-scale urban trajectory representations across semantic abstraction levels. HiT-JEPA adopts a three-layer hierarchy that progressively captures point-level fine-grained details, intermediate patterns, and high-level trajectory abstractions, enabling the model to integrate both local dynamics and global semantics in one coherent structure. Extensive experiments on multiple real-world datasets for trajectory similarity computation show that HiT-JEPA's hierarchical design yields richer, multi-scale representations.

Primary Area: applications to computer vision, audio, language, and other modalities

Submission Number: 23718

Loading