Keywords: Generative Models, Evaluation Metrics, FID
TL;DR: Reference dataset geometry significantly moderates FID, so distributional metrics should be reported alongside dataset geometry rather than treated as generator-only scores.
Abstract: Fréchet Inception Distance (FID) is widely used to evaluate image generators, yet lower FID does not always correspond to better sample quality.
We show that this mismatch depends in part on the geometry of the reference dataset.
In a controlled study across six datasets, distributional density and effective rank significantly explain how FID changes as sample quality improves.
Concentrated datasets tend to yield more favorable FID trends, whereas more dispersed datasets can make FID worsen despite better samples.
Attribution to precision and recall and ablations with alternative feature spaces and distances support the same conclusion.
These results suggest that distributional metrics should be interpreted together with the geometry of the reference dataset for more reliable benchmarking.
Paper Type: Short (4 pages)
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 50
Loading