Keywords: Metric, Evaluation, Video Generation, Generative Models
TL;DR: We propose FVD: a new metric for generative models of video based on FID. A large-scale human study confirms that FVD correlates well with qualitative human judgment of generated videos.
Abstract: Recent advances in deep generative models have lead to remarkable progress in synthesizing high quality images. Following their successful application in image processing and representation learning, an important next step is to consider videos. Learning generative models of video is a much harder task, requiring a model to capture the temporal dynamics of a scene, in addition to the visual presentation of objects. While recent generative models of video have had some success, current progress is hampered by the lack of qualitative metrics that consider visual quality, temporal coherence, and diversity of samples. To this extent we propose Fréchet Video Distance (FVD), a new metric for generative models of video based on FID. We contribute a large-scale human study, which confirms that FVD correlates well with qualitative human judgment of generated videos.