FreeEyeglass: Training-free and Mask-free Eyeglass Transfer for Facial Videos

Weng Ian Chan; Yuantian Huang; Xingchao Yang; Fumio Okura; Takafumi Taketomi

FreeEyeglass: Training-free and Mask-free Eyeglass Transfer for Facial Videos

Weng Ian Chan, Yuantian Huang, Xingchao Yang, Fumio Okura, Takafumi Taketomi

17 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: video editing, eyeglasses, facial video, virtual try-on

TL;DR: A training-free approach based on diffusion autoencoder to transfer desired eyeglasses semantically to facial videos

Abstract: The rise of e-commerce and short-video platforms has fueled demand for realistic video-based virtual try-on. Unlike virtual try-on of clothing, which has been actively studied so far, virtual try-on of eyeglasses is uniquely challenging: they physically interact with facial geometry, and they strongly affect facial identity, making faithful preservation of unedited regions especially important. Existing generative editing approaches, such as GAN- and diffusion-based methods, lack reconstruction objectives and often rely on inpainting, which fails to ensure identity consistency. We argue that semantic editing requires not only plausible generation but also faithful reconstruction, making autoencoder-based latent spaces particularly suitable. We introduce a training-free, reference-guided framework for video eyeglass transfer built on Diffusion Autoencoders (DiffAE). By blending semantic features in the encoder and incorporating spatial-temporal self-attention, our method achieves realistic, identity-preserving, and temporally consistent results, and points to the potential of autoencoder-based latent spaces for local video editing. Our implementations and datasets will be released upon acceptance.

Supplementary Material: zip

Primary Area: applications to computer vision, audio, language, and other modalities

Submission Number: 8482

Loading