Abstract: Water quality forecasting is a time-series analysis task involving estimating future water conditions, vital in environmental management and pollution control. However, existing time-series analysis methods focus only on historical observational data, neglecting information from other modalities, leading to incomplete feature extraction and affecting forecasting accuracy and robustness. In addition, the complex spatial dependencies between water quality monitoring stations and the nonlinear fluctuations in water quality indicators caused by meteorological factors present additional challenges. This work proposes a spatiotemporal multimodal fusion architecture for long-term water quality forecasting, named spatiotemporal multimodal fusion (STMF), to address these issues. It first captures spatiotemporal dependencies by integrating temporal features with upstream–downstream relationships among monitoring stations. Then, STMF further designs a low-rank cross-modal interaction fusion (LRCIF) method, which fuses spatiotemporal features with precipitation features from the remote-sensing image, as an additional modality, effectively leveraging complementary information from multiple data sources to enhance the accuracy and stability of water quality forecasting. Experimental results on real-world water quality datasets demonstrate that the proposed STMF significantly outperforms existing state-of-the-art methods in prediction accuracy. In particular, for long-term forecasting tasks with a 192-step horizon, STMF improves mean-squared error and mean absolute error by 14% and 12%, respectively, compared to unimodal models. It further validates the effectiveness of the multimodal fusion strategy. Overall, STMF offers an effective solution for water quality monitoring and management.
External IDs:doi:10.1109/jiot.2025.3581282
Loading