Abstract: Video frame interpolation (VFI) synthesizes new frames from original video frames to produce high frame-rate videos and enhance their visual appeal. The quality of these interpolated frames significantly affects the perceptual experience of the synthesized video. Recent research in VFI has increasingly focused on perceptual quality of the interpolated frames and the overall video. However, most existing quality metrics do not align well with human perceptual experiences and often suffer from unnatural artifacts in the interpolated frames. Consequently, there is an urgent need for VFI video quality assessment (VFIVQA) methods to assess the quality of the synthesized videos. In this paper, we propose both a full-reference (FR) method and a no-reference (NR) method for VFIVQA. The FR method employs two feature extraction blocks to measure continuous frame changes, extracting flow features with short temporal spans and motion features with long temporal spans. By calculating multilevel similarities in the temporal dimension of 3D convolutional neural networks and fusing these similarity features, the quality score of the VFI video is obtained from the quality regression network. Since the flow feature extraction block does not utilize the reference VFI video, the proposed NR method consists solely of this feature block. Extensive validation on several VFIVQA datasets demonstrates that the proposed methods outperform state-of-the-art FR and NR methods.
Loading