V-LASIK: Consistent Glasses-Removal from Videos Using Synthetic Data

Published: 04 Mar 2025, Last Modified: 17 Apr 2025ICLR 2025 Workshop SynthDataEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Video editing, Video inpainting, Weak supervision, Synthetic data
Abstract: Recently, diffusion-based generative models showed remarkable image and video editing capabilities. However, local video editing, particularly removal of small attributes like glasses, remains a challenge. Existing methods either alter the videos excessively, generate unrealistic artifacts, or fail to perform the requested edit consistently. In this work, we focus on consistent and identity-preserving glasses-removal in videos, as a case study for video local attribute removal. We demonstrate the generalizability of our method by applying it to facial sticker removal from videos. Due to lack of paired data, we adopt a weakly supervised approach and generate synthetic imperfect data, using an adjusted pretrained diffusion model. We show that despite data imperfection, by learning from our generated data and leveraging the prior of pretrained models, our model is able to perform the desired edit consistently while preserving the original video content. Furthermore, we suggest a new normalization method, Inside-Out Normalization, which aligns colors in a filled region with colors outside that region. Our approach offers significant improvement over existing methods, showcasing the potential of leveraging synthetic data and strong priors for local video editing tasks.
Submission Number: 29
Loading