Focused Diffusion GAN: Object-Centric Image Generation Using Integrated GAN and Diffusion Frameworks

Focused Diffusion GAN: Object-Centric Image Generation Using Integrated GAN and Diffusion Frameworks

ICLR 2026 Conference Submission17219 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Generative Models, Diffusion Models, Object-Centric, hybrid model

TL;DR: Hybrid Diffusion - GAN model

Abstract: Generative Adversarial Networks (GANs) and Diffusion Models (DMs) have shown significant progress in synthesizing high-quality object-centric images. However, generating realistic object-centric images remains challenging when training datasets are limited or contain degraded images (e.g., privacy-induced face blurring). Under these conditions, existing generative models frequently produce images that lack perceptual quality or exhibit overfitting to the training examples. To overcome these limitations, we propose a novel hybrid generative model, \textit{Focused Diffusion-GAN (FDGAN)}, targeting low-data object-centric regimes, which integrates a GAN discriminator directly into the diffusion model at intermediate denoising stages. Central to FDGAN is an Additional Noise Perturbation Module (ANPM) that selectively activates the GAN component only for images sufficiently denoised, ensuring the discriminator receives meaningful input. Additionally, ANPM applies targeted noise perturbations within predefined bounding-box regions, implicitly guiding the model’s focus toward key objects. FDGAN differs from other models like LayoutDiffusion, which explicitly conditions synthesis on fixed bounding-box layouts, or Diffusion-GAN and StyleGAN2-ADA, which employ noise augmentation throughout the entire training process, by combining adversarial training with targeted noise perturbations at specific intermediate diffusion steps. We evaluate FDGAN on three small object-centric datasets (Cityscapes subset, Traffic-Signs, and MS-COCO ``potted plant'') and, against strong GAN, diffusion, and object-centric baselines, show improved perceptual quality (Fréchet Distance) and reduced overfitting (Feature Likelihood Score). Ablation studies indicate that selective mid-timestep adversarial guidance together with ANPM improves the realism–overfitting trade-off in limited-data generative tasks.

Primary Area: generative models

Submission Number: 17219

Loading