Feature Likelihood Divergence: Evaluating the Generalization of Generative Models Using Samples

Marco Jiralerspong; Joey Bose; Ian Gemp; Chongli Qin; Yoram Bachrach; Gauthier Gidel

Feature Likelihood Divergence: Evaluating the Generalization of Generative Models Using Samples

Marco Jiralerspong, Joey Bose, Ian Gemp, Chongli Qin, Yoram Bachrach, Gauthier Gidel

Published: 21 Sept 2023, Last Modified: 15 Jan 2024NeurIPS 2023 posterEveryoneRevisionsBibTeX

Keywords: Generative model, FID, Evaluation, Precision, Recall, Likelihood, Overfitting, Memorization, Generalization, Diffusion, GANs

Abstract: The past few years have seen impressive progress in the development of deep generative models capable of producing high-dimensional, complex, and photo-realistic data. However, current methods for evaluating such models remain incomplete: standard likelihood-based metrics do not always apply and rarely correlate with perceptual fidelity, while sample-based metrics, such as FID, are insensitive to overfitting, i.e., inability to generalize beyond the training set. To address these limitations, we propose a new metric called the Feature Likelihood Divergence (FLD), a parametric sample-based score that uses density estimation to provide a comprehensive trichotomic evaluation accounting for novelty (i.e., different from the training samples), fidelity, and diversity of generated samples. We empirically demonstrate the ability of FLD to identify specific overfitting problem cases, where previously proposed metrics fail. We also extensively evaluate FLD on various image datasets and model classes, demonstrating its ability to match intuitions of previous metrics like FID while offering a more comprehensive evaluation of generative models.

Submission Number: 12682

Loading