Aligning Signal Leakage Matters for Synthetic Data Generation of Satellite Imagery

Ying Hua; Jessica Bader; Jae Myung Kim; Zeynep Akata

Aligning Signal Leakage Matters for Synthetic Data Generation of Satellite Imagery

Ying Hua, Jessica Bader, Jae Myung Kim, Zeynep Akata

15 Sept 2025 (modified: 12 Mar 2026)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Latent Diffusion, Synthetic data, Text-to-image generation, Satellite Imagery

Abstract: While satellite data is essential for applying computer vision to many real-world tasks, it remains expensive to acquire. Although other computer vision tasks have alleviated data procurement costs by augmenting training datasets with synthetic images from text-to-image models, such augmentation remains underdeveloped in the remote sensing domain. In this work, we propose an alternative approach for generating synthetic training data tailored to satellite imagery. To better understand the underlying problem, we begin by analyzing the impact of the target data distribution in comparison to the distributions used to train the text-to-image generation model. We find that data rarity is strongly correlated with the effectiveness of synthetic training data produced by Stable Diffusion fine-tuned on few-shot examples, suggesting that rarity can serve as a low-cost proxy for pre-evaluating the effectiveness of synthetic data generation. Notably, our analysis shows that Stable Diffusion struggles to produce useful training images for rare, out-of-distribution data. Building on this insight, we propose two modifications to the generation process tailored to satellite images: offset noise and leak-aligned noise. Both are designed to adjust the initial noise distribution and correct low-frequency characteristics. Our approaches enable improved training performance for classifiers trained on synthetic data, demonstrated on three satellite benchmarks.

Primary Area: generative models

Submission Number: 5860

Loading