Keywords: benchmarking, generative models, fid, gan
TL;DR: Fréchet inception distance (FID) is inherently biased, because it is robust towards deviations in brightness, saturation, and contrast, but sensitive towards corruptions acting on edge and texture information.
Abstract: Fréchet inception distance (FID) established itself as standard performance measuring method for generative adversarial networks (GANs). In this paper, we empirically investigate the biases that are inherited by its underlying design decision of extracting image features using the Inception v3 image classification network. As a result, we investigate how reliable FID is in terms of ranking performances of GANs. In this context, we find that FID is not aligned with human perception and exchanging Inception v3 with different image classification networks simply steers the ranking towards different biases.