Abstract: This work demonstrates that training autoregressive video diffusion models from a single, continuous video stream is not only possible but remarkably can also be competitive with standard offline training approaches given the same number of gradient steps. Our demonstration further reveals that this main result can be achieved using experience replay that only retains a subset of the preceding video stream. We also contribute three new single video generative modeling datasets suitable for evaluating lifelong video model learning: Lifelong Bouncing Balls, Lifelong 3D Maze, and Lifelong PLAICraft. Each dataset contains over a million consecutive frames from a synthetic environment of increasing complexity.
Loading