Abstract: Synthetic data is increasingly used to train deep models when large-scale annotated real datasets are unavailable, but performance often degrades due to the domain gap between synthetic and real images. We propose a diffusion-based framework for synthetic-to-real style transfer that produces realistic images while preserving semantic structure. Our method builds on latent diffusion models with ControlNet and introduces three key ideas. First, we design a dual-control representation that fuses segmentation maps with Canny edges, ensuring both semantic layout fidelity and fine-grained detail preservation while improving efficiency by avoiding multiple control passes. Second, we introduce \emph{domain-aware prompting}, where lightweight tokens (synthetic'' or real’’) are added to prompts to control domain style in image translation. Third, we adopt an iterative refinement loop in which generated images with artifacts are progressively reintroduced into training, allowing the model to correct its own errors. Experiments on GTA-to-Cityscapes show that our approach reduces the domain gap, improves mean IoU, and trains significantly faster than GAN-based baselines. Our code and data are available at \url{https://github.com/bds-ailab/syn2real}.
Loading