PoGDiff: Product-of-Gaussians Diffusion Models for Imbalanced Text-to-Image Generation

Ziyan Wang; Sizhe Wei; Xiaoming Huo; Hao Wang

PoGDiff: Product-of-Gaussians Diffusion Models for Imbalanced Text-to-Image Generation

Ziyan Wang, Sizhe Wei, Xiaoming Huo, Hao Wang

Published: 18 Sept 2025, Last Modified: 29 Oct 2025NeurIPS 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Diffusion Model, Probabilistic Methods

TL;DR: We propose a general fine-tuning approach to address the performance drops on the imbalanced text-to-image generation tasks.

Abstract: Diffusion models have made significant advancements in recent years. However, their performance often deteriorates when trained or fine-tuned on imbalanced datasets. This degradation is largely due to the disproportionate representation of majority and minority data in image-text pairs. In this paper, we propose a general fine-tuning approach, dubbed PoGDiff, to address this challenge. Rather than directly minimizing the KL divergence between the predicted and ground-truth distributions, PoGDiff replaces the ground-truth distribution with a Product of Gaussians (PoG), which is constructed by combining the original ground-truth targets with the predicted distribution conditioned on a neighboring text embedding. Experiments on real-world datasets demonstrate that our method effectively addresses the imbalance problem in diffusion models, improving both generation accuracy and quality.

Primary Area: Probabilistic methods (e.g., variational inference, causal inference, Gaussian processes)

Submission Number: 14351

Loading