Animer: Generating Editable Videos from Images and Text

03 Sept 2025 (modified: 25 Sept 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Video Generation
Abstract: Our method combines advanced computer vision techniques with natural language processing to create dynamic video content that maintains high visual fidelity while enabling subsequent editing operations. The proposed framework takes as input a collection of images along with descriptive text prompts, and synthesizes coherent video sequences with realistic motion, lighting, and temporal consistency. Key innovations include a multi-modal encoder that effectively fuses visual and textual information, a temporal generation module that ensures smooth frame transitions, and an editing-aware architecture that preserves video structure for post-generation modifications. Extensive experiments demonstrate that our approach outperforms existing methods in terms of video quality, motion realism, and editability metrics. The generated videos maintain semantic consistency with the input descriptions while allowing users to perform various editing operations such as object manipulation, scene modification, and style transfer without degrading visual quality. This work opens new possibilities for content creation applications in entertainment, education, and digital media production.
Primary Area: generative models
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2026/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 1218
Loading