Pairwise Comparisons: Background Scene Consistency


Example 1

Human:
Video 1 | Video 2

VB-BG:
Video 1 | Video 2
Score 1 : 0.1030 | Score 2 : 0.1941

MS-Debias (Ours):
Video 1 | Video 2
Score 1 : 0.2032 | Score 2 : 0.1864

Despite extreme distortions in the background scene of Video 2, VB-BG still prefers it over Video 1 as the camera barely moves in the scene. Whereas, our MS-Debias captures more subtle changes in the background, leading to a more accurate assessment.


Example 2

Human:
Video 1 | Video 2

VB-BG:
Video 1 | Video 2
Score 1 : 0.1649 | Score 2 : 0.2821

MS-Debias (Ours):
Video 1 | Video 2
Score 1 : 0.3870 | Score 2 : 0.3303

In Video 2, the building in the background exhibits localized distortions compared to Video 1. VBench prefers this video as contains lesser camera motion.


Example 3

Human:
Video 1 | Video 2

VB-BG:
Video 1 | Video 2
Score 1 : 0.4150 | Score 2 : 0.5055

MS-Debias (Ours):
Video 1 | Video 2
Score 1 : 0.3298 | Score 2 : 0.3079

The background walls in Video 2, tend to slide and distort. As long as the scene content remains inside the frame, these changes may go unnoticed by VB-BG.


Example 4

Human:
Video 1 | Video 2

VB-BG:
Video 1 | Video 2
Score 1 : 0.5531 | Score 2 : 0.6230

MS-Debias (Ours):
Video 1 | Video 2
Score 1 : 0.7075 | Score 2 : 0.5526

The building in Video 2, undergoes a sharp transformation that goes undetected by VB-BG, whereas our metric is sensitive to such pixel-level changes.