Data-free VFX Self-Mining

10 Sept 2025 (modified: 12 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Visual effects, Video generation, Agent
TL;DR: We present AutoVFX, an automated framework for extracting and amplifying visual-effects (VFX) capabilities from pretrained Image-to-Video (I2V) founda- tion models, thereby obviating costly manual dataset construction and annotation.
Abstract: We present AutoVFX, an automated framework for extracting and amplifying visual‑effects (VFX) capabilities from pretrained Image‑to‑Video (I2V) foundation models, thereby obviating costly manual dataset construction and annotation. Motivated by the observation that contemporary I2V models possess latent but unreliable VFX competence, we operationalizes a closed‑loop agent composed of four coordinated modules: \textbf{\textit{i)}} VFX Designer: structured prompt exploration and decomposition via an LLM; \textbf{\textit{ii)}} Scene Artist: VFX‑aware first‑frame synthesis using state‑of‑the‑art text‑to‑image models and automated image selection; \textbf{\textit{iii)}} Video Producer: I2V synthesis with multimodal per‑clip evaluation (perceptual quality metrics and semantic consistency); and \textbf{\textit{iv)}} VFX Refiner: selective data curation and cycle‑finetuning of the I2V backbone. Central to our approach is a scalable multimodal quality controller that enforces both per‑frame aesthetic fidelity and per‑clip semantic alignment, and a cycle‑finetuning regime that iteratively improves training data and model behavior. To assess performance, we introduce VFX‑Bench, a diverse suite of challenging VFX tasks, and report two complementary metrics termed Comprehensive Score and Success Rate. Empirical evaluation demonstrates that AutoVFX substantially raises performance relative to off‑the‑shelf I2V baselines, yields favorable scalability and cost profiles compared to manual dataset approaches, and outperforms several VFX‑tailored baselines. All data and code will be made publicly available.
Supplementary Material: zip
Primary Area: generative models
Submission Number: 3554
Loading