Is Adding More Modalities Better in a Multimodal Spatio-temporal Prediction Scenario? A Case Study on Japan Air Quality

Yutaro Mishima, Guillaume Habault, Shinya Wada

2021 (modified: 04 Nov 2022)MobiQuitous 2021Readers: Everyone

Abstract: Nowadays, several spatio-temporal datasets are made available for research purposes (e.g., location, traffic or meteorology dataset). These datasets are more and more utilized as multimodal inputs of neural networks in order to perform spatio-temporal predictions. However, there are few methods that include functions, which explicitly capture cross-modal relationships. This lack of information will be a serious problem when more complex modalities and dependencies among modalities will need to be taken into consideration. Considering that in the future more spatio-temporal datasets will be made available, it is of crucial importance to tackle this problem. In this paper, we conduct some preliminary experiments to confirm whether an existing multimodal spatio-temporal network performs better when another modality is added. These experiments compare air quality forecasting performance using a trimodal spatio-temporal dataset. This comparison is realized with several methods and especially one that has been modified to handle multiple modalities. Based on the obtained results, we confirm that prediction performance does not improve when another modality is simply added. Therefore, some methods are required to capture complex cross-modal relationships.

0 Replies