Keywords: Remote Sensing,generative foundation model,data pruning
Abstract: Large-scale datasets have propelled progress in generation foundation models for remote sensing, but training on such data incurs substantial storage and compute costs. In addition, globally collected raw data often exhibit redundancy, noise, and class imbalance, which undermines training efficiency and generation quality. Existing Remote Sensing generative foundation models typically aggregate multiple classification datasets or apply simplistic deduplication, thereby overlooking the distributional requirements of generation modeling as well as the inherent heterogeneity and diversity of remote sensing imagery. To address these limitations, we propose an efficient, two-stage data pruning approach for remote sensing generative foundation models. This approach simultaneously incorporates local information content with global scene-level diversity and representativeness. Specifically, an entropy-based criterion is applied initially to efficiently eliminate low-information samples. Leveraging remote sensing scene classification datasets as reference benchmarks, we then perform scene-aware clustering with stratified sampling, which enhances the effectiveness of clustering while reducing the computational cost of clustering on large-scale unlabeled data. Finally, by balancing cluster-level uniformity with sample representativeness, the method enables fine-grained selection under high pruning ratios while preserving overall diversity and representativeness. Experiments on both curated remote sensing datasets and large-scale global data demonstrate that our pruning strategy significantly improves convergence and generation quality. Moreover, generation foundation models trained with our method consistently achieve state-of-the-art performance across multiple downstream tasks, including super-resolution and semantic image synthesis. This data pruning paradigm provides practical guidance and empirical reference for the development of remote sensing generative foundation models.
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 2744
Loading