CSMoE: An Efficient Remote Sensing Foundation Model with Soft Mixture-of-Experts

Leonard Hackel, Tom Burgert, Begüm Demir

Published: 16 Sept 2025, Last Modified: 28 Jan 2026ArxivEveryoneCC BY 4.0

Abstract: Self-supervised learning (SSL) through masked au- toencoders (MAEs) has recently attracted great attention for remote sensing (RS) foundation model (FM) development, en- abling improved representation learning across diverse sensors and downstream tasks. However, existing RS FMs often either suffer from substantial computational complexity during both training and inference or exhibit limited representational capac- ity. These issues restrict their practical applicability in RS. To address this limitation, we propose an adaptation for enhancing the efficiency of RS FMs by integrating the Soft mixture-of- experts (MoE) mechanism into the FM. The integration of Soft MoEs into the FM allows modality-specific expert specialization alongside shared cross-sensor representation learning. To demon- strate the effectiveness of our adaptation, we apply it on the Cross-Sensor Masked Autoencoder (CSMAE) model, resulting in the Cross-Sensor Mixture-of-Experts (CSMoE) model. In addi- tion, we introduce a thematic-climatic descriptor-driven sampling strategy for the construction of a representative and diverse training set to train our CSMoE model. Extensive experiments on scene classification, semantic segmentation, and content-based image retrieval (CBIR) demonstrate that our adaptation yields a reduction in computational requirements while maintaining or improving representational performance. Compared to state-of- the-art RS FMs, CSMoE achieves a superior trade-off between representational capacity, accuracy, and computational efficiency. On average, CSMoE achieves more than twice the computational efficiency of existing RS FMs, while maintaining competitive performance across all experiments. These results highlight the effectiveness of the proposed adaptation for creating scalable and computationally efficient RS FMs. The associated code for the model and the training set creation, as well as the pretrained model weights, will be available at https://git.tu-berlin.de/rsim/ csmoe.