CLMTR: a generic framework for contrastive multi-modal trajectory representation learning

Published: 2025, Last Modified: 07 Jan 2026GeoInformatica 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Multi-modal trajectory representation learning aims to convert raw trajectories into low-dimensional embeddings to facilitate downstream trajectory analysis tasks. However, existing methods focus on spatio-temporal trajectories and often neglect additional modal features such as textual or imagery data. Moreover, these methods do not fully consider the correlations among different modal features and the relationships among trajectories, thus hindering the generation of generic and semantically enriched representations. To address these limitations, we propose a generic Contrastive Learning-based Multi-modal Trajectory Representation framework, termed CLMTR. Specifically, we incorporate intra- and inter-trajectory contrastive learning components to capture the correlations among diverse modal features and the intricate relationships among trajectories, obtaining generic and semantically enriched trajectory representations. We develop multi-modal feature embedding and attention-based fusion approaches to capture the multi-modal characteristics and adaptively obtain the unified embeddings. Experimental results on two real-world datasets demonstrate the superior performance of CLMTR over state-of-the-art methods in three downstream tasks.
Loading