Revealing the Power of Masked Autoencoders in Traffic Forecasting

Jiarui Sun, Yujie Fan, Chin-Chia Michael Yeh, Wei Zhang, Girish Chowdhary

Published: 2024, Last Modified: 15 May 2025CIKM 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Traffic forecasting, crucial for urban planning, requires accurate predictions of spatial-temporal traffic patterns across urban areas. Existing research mainly focuses on designing complex spatial-temporal models to capture these dependencies. However, this field faces challenges related to data scarcity and model stability, which results in limited performance improvement. To address these issues, we propose Spatial-Temporal Masked AutoEncoders (STMAE), a plug-and-play framework designed to enhance existing spatial-temporal models on traffic prediction. STMAE operates in two stages. In the pretraining stage, an encoder processes partially visible traffic data produced by a dual-masking strategy, including biased random walk-based spatial masking and patch-based temporal masking. Subsequently, two decoders aim to reconstruct the masked counterparts from both spatial and temporal perspectives. The fine-tuning stage retains the pretrained encoder and integrates it with decoders from existing backbones to improve traffic forecasting accuracy. Our results on traffic benchmarks show that STMAE can largely enhance the forecasting capabilities of various spatial-temporal models.