HMDST: A Hybrid Model-Data Driven Approach for Spatio-Temporally Consistent Video Inpainting

Li Fang, Kaijun Zou, Zhiye Chen, Long Ye

Published: 2024, Last Modified: 18 Apr 2025ICME 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Video inpainting fills in the missing regions in videos, which should be coherent and natural. Conventional model-driven methods with manual priors may produce blurry, distorted or inconsistent results due to the absence of high-level semantics. While data-driven approaches like deep learning directly learn mappings from observations to target videos, they may face challenges in quality, generalization, and robustness. Existing methods use optical flow to model the motion and context between frames, but the flow estimation may be inaccurate or unstable in the missing regions. This paper introduces a hybrid model-data driven approach for spatio-temporally consistent video inpainting. It combines prior knowledge of imaging and deep prior learned through training, and employs elaborately designed modules to accurately model the motion and contextual information. Our network can be trained end-to-end, leading to a more efficient and effective inpainting process. Extensive experiments demonstrate the superiority of our method qualitatively and quantitatively.