The left video is generated, while the right is ground truth.