A dataset of harmonized global air quality monitoring metadata

Published: 16 Feb 2026, Last Modified: 16 Apr 2026Springer Nature Scientific DataEveryoneCC BY 4.0
Abstract: This study addresses the gap in air quality monitoring metadata reporting by building a classifier for air quality station types and area characteristics. It leverages ultra-high-resolution land cover data, complemented by additional demographic and gridded information. We employ advanced machine learning methods, including convolutional neural networks and transformers. Through a custom training approach, we fine-tune pre-trained models on 7000 images and label +8000 additional monitors, resulting in a robust model for classifying air quality stations by area characteristics (urban, rural) and source type (background, non-background). The result is a global harmonized dataset of governmental air quality station metadata for particulate matter, with ~ 15000 monitors from 106 countries. For each station, the dataset provides an identifier, geographical coordinates, country, area characteristics, source type, and classification status. This dataset enables global feasibility studies and regional analyses of conditions leading to exposure. By providing a consistent classification of monitoring stations, it also allows for meaningful comparisons of sectoral exposure contributions across countries, regions, and station types, supporting comparative studies and health impact assessments.
Loading