For each example, the first row is the result from FILM, the second row is the result from MaskINT, and the third row is the ground truth
