Feature Likelihood Divergence: Evaluating the Generalization of Generative Models Using Samples

Published: 21 Sept 2023, Last Modified: 15 Jan 2024NeurIPS 2023 posterEveryoneRevisionsBibTeX
Keywords: Generative model, FID, Evaluation, Precision, Recall, Likelihood, Overfitting, Memorization, Generalization, Diffusion, GANs
Abstract: The past few years have seen impressive progress in the development of deep generative models capable of producing high-dimensional, complex, and photo-realistic data. However, current methods for evaluating such models remain incomplete: standard likelihood-based metrics do not always apply and rarely correlate with perceptual fidelity, while sample-based metrics, such as FID, are insensitive to overfitting, i.e., inability to generalize beyond the training set. To address these limitations, we propose a new metric called the Feature Likelihood Divergence (FLD), a parametric sample-based score that uses density estimation to provide a comprehensive trichotomic evaluation accounting for novelty (i.e., different from the training samples), fidelity, and diversity of generated samples. We empirically demonstrate the ability of FLD to identify specific overfitting problem cases, where previously proposed metrics fail. We also extensively evaluate FLD on various image datasets and model classes, demonstrating its ability to match intuitions of previous metrics like FID while offering a more comprehensive evaluation of generative models.
Submission Number: 12682