Keywords: remote sensing, deep learning, self-supervised learning, ai for good, land cover mapping, satellite data, agriculture
TL;DR: Galileo is a generalist model of remote sensing data that processes multiple input modalities and shapes (image time series, pixel time series) by a single model with state-of-the-art accuracy on 11 diverse benchmarks.
Abstract: We introduce a highly multimodal transformer that analyzes many remote sensing modalities --- multispectral optical, synthetic aperture radar, elevation maps, weather, pseudo-labels, and more --- across space and time. These inputs are useful for diverse remote sensing tasks, e.g., crop mapping, flood detection, etc. However, learning representations of remote sensing data is challenging; e.g., objects of interest vary massively in scale, from small vessels (1-2 pixels and transient) to glaciers (thousands of pixels and persistent). We present a novel self-supervised learning algorithm that extracts multi-scale features through masked modeling. Our two-task approach consists of global and local training objectives that differ w.r.t. prediction targets (deep vs. shallow) and masking strategies (structured vs. not).
With a single pretrained encoder, our Galileo model outperforms SoTA models for satellite images and pixel-time series --- extensively evaluated over eleven benchmarks spanning multiple task types.
Submission Number: 15
Loading