CCC: Prompt Evolution for Video Generation via Structured MLLM Feedback

ICLR 2026 Conference Submission9642 Authors

17 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: video generation, prompt evolution, multimodal large language models
TL;DR: We propose Critique Coach Calibration (CCC), a training-free prompt evolution framework where MLLMs iteratively critique and refine prompts to improve video generation.
Abstract: Video generation from natural-language prompts has made impressive strides, but current systems frequently misalign outputs with their input descriptions—dropping critical details, hallucinating unintended content, and often violating basic physical plausibility. Existing approaches to improving video quality typically rely on heavyweight post-editing models, which may introduce new artifacts, or costly fine-tuning of the generator backbone, limiting scalability and accessibility. We introduce Critique Coach Calibration (CCC), a training-free, model-agnostic framework for iterative prompt evolution that closes the gap between direct generation and reality. In each cycle, an off-the-shelf multimodal large language model (MLLM) produces a structured critique of a generated video—identifying semantic misalignments, subject drift, and missing objects—and then reformulates the input prompt based on this feedback. By repeatedly evolving prompts through this critique–coach loop, CCC steadily improves both video quality and adherence to everyday physics, all without modifying the generator or relying on external editing modules. Empirical results on different video models show that CCC consistently enhances semantic alignment and visual quality through the power of adaptive prompt evolution.
Primary Area: generative models
Submission Number: 9642
Loading