Self-supervised air quality estimation with graph neural network assistance and attention enhancement

Viet Hung Vu, Duc Long Nguyen, Thanh Hung Nguyen, Quoc Viet Hung Nguyen, Phi Le Nguyen, Thanh Trung Huynh

Published: 01 Jan 2024, Last Modified: 25 Jan 2025Neural Comput. Appl. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: The rapid progress of industrial development, urbanization, and traffic has caused air quality degradation that negatively affects human health and environmental sustainability, especially in developed countries. However, due to the limited number of sensors available, the air quality index at many locations is not monitored. Therefore, many research, including statistical and machine learning approaches, have been proposed to tackle the problem of estimating air quality value at an arbitrary location. Most of the existing research perform interpolation process based on traditional techniques that leverage distance information. In this work, we propose a novel deep-learning-based model for air quality value estimation. This approach follows the encoder–decoder paradigm, with the encoder and decoder trained separately using different training mechanisms. In the encoder component, we proposed a new self-supervised graph representation learning approach for spatio-temporal data. For the decoder component, we designed a deep interpolation layer that employs two attention mechanisms and a fully connected layer using air quality data at known stations, distance information, and meteorology information at the target point to predict air quality at arbitrary locations. The experimental results demonstrate significant improvements in estimation accuracy achieved by our proposed model compared to state-of-the-art approaches. For the MAE indicator, our model enhances the estimation accuracy from 4.93% to 34.88% on the UK dataset, and from 6.89% to 31.94% regarding the Beijing dataset. In terms of the RMSE, the average improvements of our method on the two datasets are 13.33% and 14.37%, respectively. The statistics for MAPE are 36.05% and 13.25%, while for MDAPE, they are 24.48% and 36.33%, respectively. Furthermore, the value of \(R_2\) score attained by our proposed model also shows considerable improvement, with increases of 5.39% and 32.58% compared to that of comparison benchmarks. Our source code and data are available at https://github.com/duclong1009/Unsupervised-Air-Quality-Estimation.