Multisensory Geospatial Models via Cross-Sensor Pretraining

23 Sept 2023 (modified: 25 Mar 2024)ICLR 2024 Conference Withdrawn SubmissionEveryoneRevisionsBibTeX
Keywords: geospatial pretraining, multisensor modalities, cross-sensor pretraining, remote sensing applications, masked image modeling
Abstract: Geospatial remote sensors, derived from optical and microwave sensors, exhibit significant diversity and provide unique capabilities due to the different observing mechanisms. By integrating multi-sensor data through fusion, researchers can harness the complementary and synergistic nature of optical and microwave data to achieve more accurate and efficient Earth monitoring. Despite the proven enhancements by geospatial pretraining models on various downstream tasks, most research primarily focuses on single sensor modality. Thus, to unlock these synergies, we introduce a multi-sensor geospatial pretraining model, XGeo, pretrained with four sensor modalities: RGB channels, Sentinel-2, Synthetic Aperture Radar (SAR), and Digital Surface Model (DSM) data, encompassing a total of two million multisensor images. Our method is equipped to manage both paired and unpaired data effectively. When originating from the same geolocation, we integrate cross-linked corresponding sensors into the modeling of the masked image, which facilitates the learning of a joint representation from multiple sensors. In addition, we utilize a mixture of expert layers and heterogeneous batches to mitigate data heterogeneity. Our experiments show that XGeo enhances performance on both single sensor and multisensor downstream tasks, such as land-use classification, segmentation, cloud removal, and pan-sharpening. We also reveal that representations from natural images differ from some of geospatial remote sensors, which renders the use of existing representations less effective. Our work serves as a comprehensive guide for developing robust multisensor geospatial pretraining models, paving the way for more advanced geospatial capabilities.
Primary Area: representation learning for computer vision, audio, language, and other modalities
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 8376
Loading