Beyond Identity: High-Fidelity Face Swapping by Preserving Source Video Attributes

Zekai Luo; Zongze Du; Zhouhang Zhu; Hao Zhong; Muzhi Zhu; Wen Wang; Yuling Xi; Chenchen Jing; Hao Chen; Chunhua Shen

Beyond Identity: High-Fidelity Face Swapping by Preserving Source Video Attributes

Zekai Luo, Zongze Du, Zhouhang Zhu, Hao Zhong, Muzhi Zhu, Wen Wang, Yuling Xi, Chenchen Jing, Hao Chen, Chunhua Shen

03 Sept 2025 (modified: 14 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: diffusion model

Abstract: Video face swapping is crucial in film and entertainment production, where achieving high fidelity and temporal consistency over long and complex video sequences remains a significant challenge. Inspired by recent advances in reference-guided image editing, we explore whether rich visual attributes from source videos can be similarly leveraged to enhance both fidelity and temporal coherence in video face swapping. This work presents LivingFace, the first video reference guided face swapping model. Our approach employs keyframes as conditioning signals to inject the target identity, enabling flexible and controllable editing. By combining keyframe conditioning with video reference guidance, the model performs temporal stitching to ensure stable identity preservation and high-fidelity reconstruction across long video sequences. To address the scarcity of data for reference-guided training, we construct a paired face-swapping dataset, Face2FaceSwap, where the generated data are fed as inputs and the original data serve as ground truth, thereby enabling reliable supervision. Extensive experiments demonstrate that our method achieves state-of-the-art results, seamlessly integrating the target identity with the source video’s expressions, lighting, and motion, while significantly reducing manual effort in production workflows.

Supplementary Material: zip

Primary Area: generative models

Submission Number: 1257

Loading