The three files correspond to visualizations on the Calvin, Droid, and Franka Panda datasets. Each row represents one sample; the left column shows the ground truth, and the right column shows the results generated by our model.