Focused Diffusion GAN: Object-Centric Image Generation Using Integrated GAN and Diffusion Frameworks
Keywords: Generative Models, Diffusion Models, Object-Centric, hybrid model
TL;DR: Hybrid Diffusion - GAN model
Abstract: Generative Adversarial Networks (GANs) and Diffusion Models (DMs) have shown significant progress in synthesizing high-quality object-centric images. However, generating realistic object-centric images remains challenging when training datasets are limited or contain degraded images (e.g., privacy-induced face blurring). Under these conditions, existing generative models frequently produce images that lack perceptual quality or exhibit overfitting to the training examples. To overcome these limitations, we propose a novel hybrid generative model, \textit{Focused Diffusion-GAN (FDGAN)}, targeting low-data object-centric regimes, which integrates a GAN discriminator directly into the diffusion model at intermediate denoising stages. Central to FDGAN is an Additional Noise Perturbation Module (ANPM) that selectively activates the GAN component only for images sufficiently denoised, ensuring the discriminator receives meaningful input. Additionally, ANPM applies targeted noise perturbations within predefined bounding-box regions, implicitly guiding the model’s focus toward key objects. FDGAN differs from other models like LayoutDiffusion, which explicitly conditions synthesis on fixed bounding-box layouts, or Diffusion-GAN and StyleGAN2-ADA, which employ noise augmentation throughout the entire training process, by combining adversarial training with targeted noise perturbations at specific intermediate diffusion steps. We evaluate FDGAN on three small object-centric datasets (Cityscapes subset, Traffic-Signs, and MS-COCO ``potted plant'') and, against strong GAN, diffusion, and object-centric baselines, show improved perceptual quality (Fréchet Distance) and reduced overfitting (Feature Likelihood Score). Ablation studies indicate that selective mid-timestep adversarial guidance together with ANPM improves the realism–overfitting trade-off in limited-data generative tasks.
Primary Area: generative models
Submission Number: 17219
Loading