How big does your neural network have to be?: A Scaling Law Study in Multi-Spectral Remote Sensing

Mingshi Li; Dusan Grujicic; Stien Heremans; Ben Somers; Matthew B. Blaschko

How big does your neural network have to be?: A Scaling Law Study in Multi-Spectral Remote Sensing

Mingshi Li, Dusan Grujicic, Stien Heremans, Ben Somers, Matthew B. Blaschko

27 Sept 2024 (modified: 25 Nov 2024)ICLR 2025 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: machine learning, remote sensing

Abstract: Remote sensing imagery from systems such as Sentinel provides full coverage of the Earth's surface at around 10 meter resolution. The remote sensing community has transitioned to extensive use of deep learning models based on their high performance on benchmarks such as the ISPRS Vaihingen. Convolutional models such as UNet and ResNet variations are commonly employed for remote sensing but typically only accept three channels due to their development for RGB imagery, while Sentinel satellite systems have more than 10. Recently, a number of transformer architectures have also been proposed for remote sensing, but they typically have not been extensively benchmarked and have only been employed on rather small datasets. Meanwhile, it is becoming possible to obtain dense spatial land-use labels for entire first-level administrative divisions of some countries. Scaling law observations indicate that substantially larger, multi-spectral transformer models may provide a huge leap in the performance of remote sensing models in these settings. In this work, we develop a family of multi-spectral transformer models, which we evaluate across orders of magnitude differences in model parameters to evaluate their performance and scaling effectiveness on a densely labeled imagery dataset. We develop a novel multi-spectral attention strategy and demonstrate its effectiveness through ablations. We further show in this setting that models many orders of magnitude larger than conventional architectures such as UNet lead to substantial improvements in accuracy: a UNet++ model with 23M parameters results in less than 65\% accuracy, while a multi-spectral transformer with 655M parameters yields an accuracy of over 95\% on the Biological Valuation Map of Flanders. A link to open source code will be provided in the camera ready document.

Primary Area: applications to computer vision, audio, language, and other modalities

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 10407

Loading