Human:
Video 1 ✔ | Video 2 ✘
VB-BG:
Video 1 ✘ | Video 2 ✔
Score 1 : 0.3532 | Score 2 : 0.7882
MS-Debias (Ours):
Video 1 ✘ | Video 2 ✔
Score 1 : 0.4205 | Score 2 : 0.7999
Very gradual variations in background (looking closer at the tree house details) are still to be captured by our metric.
Human:
Video 1 ✔ | Video 2 ✘
VB-BG:
Video 1 ✘ | Video 2 ✔
Score 1 : 0.1760 | Score 2 : 0.3212
MS-Debias (Ours):
Video 1 ✘ | Video 2 ✔
Score 1 : 0.1988 | Score 2 : 0.5705
A similar case to Example 1.
Human:
Video 1 ✔ | Video 2 ✘
VB-BG:
Video 1 ✘ | Video 2 ✔
Score 1 : 0.1361 | Score 2 : 0.3500
MS-Debias (Ours):
Video 1 ✘ | Video 2 ✔
Score 1 : 0.0683 | Score 2 : 0.6574
As our metric objectively evaluates consistency, it has limited awareness of what is real. Evaluating Video 2 requires a deeper understanding of the scene context.
Human:
Video 1 ✔ | Video 2 ✘
VB-BG:
Video 1 ✔ | Video 2 ✘
Score 1 : 0.6904 | Score 2 : 0.6857
MS-Debias (Ours):
Video 1 ✘ | Video 2 ✔
Score 1 : 0.4186 | Score 2 : 0.5092
VB-BG considers deformable regions like water to be consistent, while MS-Debias focuses too much on the details of deformable regions.