DLDC: A Dual Loop Data Cleaning Method for Fine-tuning Remote Sensing Image Generative Models

Published: 30 Oct 2025, Last Modified: 17 Feb 2026IEEE Journal of Selected Topics in Applied Earth Observations and Remote SensingEveryoneCC BY-NC-ND 4.0
Abstract: Text-to-image (T2I) generation, offering flexible and intuitive synthetic data for downstream geoscience applications, has garnered increasing attention in recent years. Training a good T2I model often requires high-quality, large-scale image–text datasets. However, obtaining these datasets in remote sensing (RS) is challenging because of high annotation costs and specific domain knowledge. This study proposes a dual loop data cleaning (DLDC) method, which leverages contrastive multimodal quality evaluations to generate high-quality RS image–text training data automatically. By constructing an external generation loop (EGL) based on a multimodal foundational model and an internal evaluation loop (IEL) based on contrastive learning metrics, DLDC can automatically generate layout description and evaluate image–text matching degree on satellite images. The proposed approach effectively filters out noisy samples and curates a refined dataset without human intervention. Experimental results show that our dual loop evaluation can accurately determine the optimal data cleaning ratio for different scenes, improving image generation quality. Compared with the pretrained T2I models, our fine-tuned models reduce Fréchet Inception Distance values by over 35%, increase CLIP scores by more than 25%, and improve RemoteCLIP scores by over 10.5%. Furthermore, our DLDC method can achieve superior performance compared to other state-of-the-art RS T2I models (e.g., Crs-diff, GeoRSSD, DiffusionSAT). Our data-cleaning method can improve downstream segmentation tasks, resulting in 8.14% in mean IoU and 7.5% in mean accuracy compared to the same model trained on raw or uncleaned data. Experimental results demonstrate that our automatically generated image–text data is of a similar quality to human manually annotated data, opening new pathways for rapid, cost-effective, and reliable RS data generation.
Loading