Submission Track: Track 1: Machine Learning Research by Muslim Authors
Keywords: Diversity quantification, CLIP embeddings, Prompt-based diversity, Information entropy
TL;DR: We propose DIVA, a unified framework for quantifying generative image diversity by combining CLIP-based, statistical, and entropy metrics.
Abstract: Generative models like Stable Diffusion, DALL·E, and Imagen have shown impressive capabilities in creating visually compelling images from textual prompts. However, not all models produce a wide variety of outputs from the same prompt. In some applications—such as creative advertising or artistic design—diverse outputs are highly valued for exploring different visual interpretations. In contrast, tasks like forensic analysis or technical illustration require high consistency to ensure reproducibility. Current diversity quantification methods, such as Bayesian frameworks and pixel-based metrics (e.g., FID, SSIM), either ignore prompt-specific variability or fail to disentangle aleatoric and epistemic factors.In this work, We present DIVA, a framework quantifying diversity through hybrid diversity metrics: mean pairwise CLIP embedding distance, feature distribution variance, and information entropy. DIVA integrates these metrics into a unified diversity score, capturing both aleatoric and epistemic uncertainty. It adapts to both diversity-expected prompts and diversity-constrained prompts. Human validation shows strong correlation between our diversity score and human judgments. This work provides a scalable solution for applications requiring reliability and transparency, from creative design to medical imaging. Github repository: https://github.com/anonymous4865/diva
Submission Number: 10
Loading