OmniPainter: Global-Local Temporally Consistent Video Inpainting Diffusion Model

ICLR 2026 Conference Submission16413 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Diffusion Model, Video Inpainting, Temporal Consistency
Abstract: Video inpainting methods often fail to resolve the inherent trade-off between long-term global consistency and short-term local smoothness, leading to artifacts such as contextual drift and flickering. We introduce OmniPainter, a latent diffusion framework designed to address this limitation. Our framework is built on two core innovations: a Flow-Guided Ternary Control mechanism for superior structural fidelity, and a novel Adaptive Global-Local Guidance strategy. This guidance strategy dynamically blends two complementary guidance scores at each denoising step: an autoregressive score to enforce local transitional smoothness, and a hierarchical score to maintain long-range global coherence. The blending weight is determined by a function of both the video's motion dynamics and the current diffusion timestep. This adaptive blending allows the model to prioritize global structure during the early stages of generation and then shift focus to local continuity during later refinement stages, thereby achieving a robust temporal equilibrium. Extensive experiments confirm that OmniPainter significantly outperforms state-of-the-art methods, setting a new standard for temporally consistent video restoration.
Supplementary Material: zip
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 16413
Loading