CMAAN: Cross-Modal Aggregation Attention Network for Next POI Recommendation

Zhuang Zhuang, Lingbo Liu, Heng Qi, Yanming Shen, Baocai Yin

Published: 2025, Last Modified: 15 Jan 2026IEEE Trans. Comput. Soc. Syst. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Next point-of-interest (POI) recommendation is to explore the historical check-in sequence information in location-based social networks (LBSNs) to recommend the next location that he/she might be interested in. However, most previous methods used only limited information of unimodal data (i.e., check-in sequences), while some recent methods have attempted to explore multimodal data (e.g., textual content) but lacked sufficient interactions between geographic behavior patterns and content behavior patterns. In this work, we argue that users usually consider geographical trajectories and textual content interdependently to determine the next location to visit. To this end, we propose a novel cross-modal aggregation attention network (CMAAN), which interactively learns multiview representations from POI sequence and content sequence for predicting the next POI. Our approach models inter-modal interaction correlations, intra-modal sequence correlations, and intra-modal semantic correlations simultaneously to fully discover contextual potential relations along the trajectories. Specifically, the intra-modal semantic correlations are able to capture the variable location functionalities under different contextual relationships of cross-modal interaction information. Moreover, we apply the aggregation attention to adaptively aggregate multiview representations which represent the comprehensive hidden state of the next POI. Extensive experiments on two large-scale datasets clearly demonstrate that our CMAAN achieves state-of-the-art performance.

External IDs:dblp:journals/tcss/ZhuangLQSY25