Path-LLM: A Multi-Modal Path Representation Learning by Aligning and Fusing with Large Language Models
Track: Systems and infrastructure for Web, mobile, and WoT
Keywords: Path representation learning, Large language models, Curriculum learning, Contrastive learning
Abstract: The advancement of intelligent transportation systems has led to a growing demand for accurate path representations, which are essential for tasks such as travel time estimation, path ranking, and trajectory analysis. However, traditional path representation learning (PRL) methods often focus solely on single-modal road network data, overlooking important physical and regional factors that influence real-world traffic dynamics. To overcome this limitation, we introduce Path-LLM, a multi-modal path representation learning model that integrates large language models (LLMs) into PRL. Our approach leverages LLMs to interpret both topological and textual data, enabling robust multi-modal path representations. To effectively align and merge these modalities, we propose TPalign, a contrastive learning-based pretraining strategy that ensures alignment within the embedding space. We then present TPfusion, a multimodal fusion module that dynamically adjusts the weight of each modality before integration. To further optimize LLM training, we introduce a \textit{Two-stage Overlapping Curriculum Learning (TOCL)} approach, which progressively increases the complexity of the training data. Finally, we evaluate Path-LLM on two real-world datasets across traditional PRL downstream tasks, achieving up to a 61.84\% improvement in path ranking performance on the Xi'an dataset. Additionally, Path-LLM demonstrates superior performance in both few-shot and zero-shot learning scenarios. Our code is available at: https://anonymous.4open.science/r/Path-LLM-F053.
Submission Number: 2472
Loading