Implicit Semi-auto-regressive Image-to-Video Diffusion

15 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Supplementary Material: zip
Primary Area: generative models
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: video generation, diffusion model
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Abstract: Diffusion models have demonstrated exceptional performance in various generative domains, particularly in the context of image and video generation. Despite their remarkable success, image-to-video (I2V) generation still remains a formidable challenge for most existing methods. Prior research has primarily concentrated on temporally modeling the entire video sequence, resulting in semantic correspondence but often lacking consistency with the initial image input in detail. In this paper, we present a novel temporal recurrent look-back approach for modeling video dynamics, leveraging prior information from the first frame (provided as a given image) as an implicit semi-auto-regressive process. Conditioned solely on preceding frames, our approach achieves enhanced consistency with the initial frame, thus avoiding unexpected generation results. Furthermore, we introduce a hybrid input initialization strategy to enhance the propagation of information within the look-back module. Our extensive experiments demonstrate that our approach is able to generate video clips with greater detail consistency relative to the provided image.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 245
Loading