Each video features two columns: on the left is the ground truth (GT), and on the right is the reconstructed video generated by our Flow-IB framework.