Open High-Resolution Satellite Imagery: The WorldStrat Dataset – With Application to Super-ResolutionDownload PDF

06 Jun 2022, 19:01 (modified: 11 Oct 2022, 22:11)NeurIPS 2022 Datasets and Benchmarks Readers: Everyone
Keywords: satellite imagery, land use, transfer learning, multi-resolution, stratification, high-resolution
TL;DR: Largest high-resolution satellite imagery dataset, stratified to cover all land uses and urbanisation along with humanitarian uses, paired with lower-resolution, to best represent the planet.
Abstract: Analyzing the planet at scale with satellite imagery and machine learning is a dream that has been constantly hindered by the cost of difficult-to-access highly-representative high-resolution imagery. To remediate this, we introduce here the WorldStratified dataset. The largest and most varied such publicly available dataset, at Airbus SPOT 6/7 satellites' high resolution of up to 1.5 m/pixel, empowered by European Space Agency's Phi-Lab as part of the ESA-funded QueryPlanet project, we curate 10,000 sq km of unique locations to ensure stratified representation of all types of land-use across the world: from agriculture to ice caps, from forests to multiple urbanization densities. We also enrich those with locations typically under-represented in ML datasets: sites of humanitarian interest, illegal mining sites, and settlements of persons at risk. We temporally-match each high-resolution image with multiple low-resolution images from the freely accessible lower-resolution Sentinel-2 satellites at 10 m/pixel. We accompany this dataset with an open-source Python package to: rebuild or extend the WorldStrat dataset, train and infer baseline algorithms, and learn with abundant tutorials, all compatible with the popular EO-learn toolbox. We hereby hope to foster broad-spectrum applications of ML to satellite imagery, and possibly develop from free public low-resolution Sentinel2 imagery the same power of analysis allowed by costly private high-resolution imagery. We illustrate this specific point by training and releasing several highly compute-efficient baselines on the task of Multi-Frame Super-Resolution. License-wise, the high-resolution Airbus imagery is CC-BY-NC, while the labels, Sentinel2 imagery, and trained weights are under CC-BY, and the source code under BSD, to allow for the widest use and dissemination. The dataset is available at \url{https://zenodo.org/record/6810792} and the software package at \url{https://github.com/worldstrat/worldstrat}.
Supplementary Material: pdf
URL: https://zenodo.org/record/6810792
Dataset Url: https://zenodo.org/record/6810792
License: License-wise, the high-resolution Airbus imagery is CC-BY-NC, while the labels, Sentinel2 imagery, and trained weights are under CC-BY, and the source code under BSD, to allow for the widest use and dissemination.
Author Statement: Yes
Contribution Process Agreement: Yes
In Person Attendance: Yes
25 Replies

Loading