Animer: Generating Editable Videos from Images and Text

Junhao Chen; Gao Kejun; Xiang Li; Fangsheng Weng; Mingze Sun; Fei Ma; Qi Tian; Hao Zhao; Ruqi Huang

Animer: Generating Editable Videos from Images and Text

Junhao Chen, Gao Kejun, Xiang Li, Fangsheng Weng, Mingze Sun, Fei Ma, Qi Tian, Hao Zhao, Ruqi Huang

03 Sept 2025 (modified: 25 Sept 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Video Generation

Abstract: Our method combines advanced computer vision techniques with natural language processing to create dynamic video content that maintains high visual fidelity while enabling subsequent editing operations. The proposed framework takes as input a collection of images along with descriptive text prompts, and synthesizes coherent video sequences with realistic motion, lighting, and temporal consistency. Key innovations include a multi-modal encoder that effectively fuses visual and textual information, a temporal generation module that ensures smooth frame transitions, and an editing-aware architecture that preserves video structure for post-generation modifications. Extensive experiments demonstrate that our approach outperforms existing methods in terms of video quality, motion realism, and editability metrics. The generated videos maintain semantic consistency with the input descriptions while allowing users to perform various editing operations such as object manipulation, scene modification, and style transfer without degrading visual quality. This work opens new possibilities for content creation applications in entertainment, education, and digital media production.

Primary Area: generative models

Code Of Ethics: true

Submission Guidelines: true

Anonymous Url: true

No Acknowledgement Section: true

Submission Number: 1218

Loading