DeepAndes: A Self-Supervised Vision Foundation Model for Multispectral Remote Sensing Imagery of the Andes

Junlin Guo, James R. Zimmer-Dauphinee, Jordan M. Nieusma, Siqi Lu, Quan Liu, Ruining Deng, Can Cui, Jialin Yue, Yizhe Lin, Tianyuan Yao, Juming Xiong, Junchao Zhu, Chongyu Qu, Yuechen Yang, Mitchell Wilkes, Xiao Wang, Parker VanValkenburgh, Steven A. Wernke, Yuankai Huo

Published: 01 Jan 2025, Last Modified: 26 Feb 2026IEEE Journal of Selected Topics in Applied Earth Observations and Remote SensingEveryoneRevisionsCC BY-SA 4.0

Abstract: By mapping sites at large scales usingremotely sensed data, archaeologists can generate unique insights into long-term demographic trends, interregional social networks, and human adaptations in the past. Remote sensing surveys complement field-based approaches, and their reach can be especially great when combined with deep learning and computer vision techniques. However, conventional supervised deep learning methods face challenges in annotating fine-grained archaeological features at scale. In addition, while recent vision foundation models have shown remarkable success in learning large-scale remote sensing data with minimal annotations, most off-the-shelf solutions are designed for RGB images rather than multispectral satellite imagery, such as the eight-band data used in our study. In this article, we introduce DeepAndes, a transformer-based vision foundation model trained on three million multispectral satellite images, specifically tailored for Andean archaeology. DeepAndes incorporates a customized DINOv2 self-supervised learning algorithm optimized for eight-band multispectral imagery, marking the first foundation model designed explicitly for the Andes region. We evaluate its image understanding performance through imbalanced image classification, image instance retrieval, and pixel-level semantic segmentation tasks. Our experiments show that DeepAndes achieves superior F1 scores, mean average precision, and Dice scores in few-shot learning scenarios, significantly outperforming models trained from scratch or pretrained on smaller datasets. This underscores the effectiveness of large-scale self-supervised pretraining in archaeological remote sensing.

External IDs:doi:10.1109/jstars.2025.3619423