We show the comparisons of Physical-Simulation-Instructed Video Generation results between ours (middle-top) and three baseline models: Trajectory-to-Video (right-top), Depth-to-Video (middle-bottom) and Image-to-Video (right-bottom), with the same input image and simulation on the left of the results.
We show physical simulation instructed video generation results with autoregressive annealed distributions, comparing our method with vanilla image-to-video inference.