Abstract: With the ever-increasing urbanization process, modeling people’s spatiotemporal activities from their online traces has become a crucial task. State-of-the-art methods for this task rely on cross-modal embedding, which maps items from different modalities (e.g., location, time, text) into the same latent space. Despite their inspiring results, existing cross-modal embedding methods merely capture co-occurrences between items without modeling their high-order interactions. In this paper, we first construct the user interaction graph and the activity graph from raw data records and propose a hierarchical cross-modal embedding method that takes the high-order relationships into consideration. We introduce both inter-record and intra-record meta-graph structures, which enable learning distributed representations that preserve high-order proximities across graphs from different layers. Our empirical experiments on three real-world datasets demonstrate that our method not only outperforms state-of-the-art methods for spatiotemporal activity prediction but also captures cross-modal proximity at a finer granularity.
Loading