Occlusion-Aware Real-Time Tiny Facial Alignment Model for Makeup Virtual Try-On

Kin Ching Lydia Chau, Zhi Yu, Ruowei Jiang

Published: 01 Jan 2024, Last Modified: 16 May 2025ISM 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Real-time makeup virtual try-on (VTO) on resource-constrained platforms like mobile devices and web browsers demands a delicate balance: models must be accurate enough for realistic results yet lightweight and fast enough for smooth performance. Existing approaches often rely on separate models for facial landmark detection and occlusion-aware segmentation, increasing complexity and hindering real-time performance. To address this, we propose a novel, unified model that performs both tasks within a single, highly efficient architecture. Specifically designed for VTO, our model offers enhanced accuracy around critical areas like the eyes and lips. We further optimize for real-time performance by leveraging temporal information: predictions from previous video frames guide current predictions, increasing parallelism and reducing inference time to as little as 16ms on an iPhone 14. Trained with a simplified pipeline, our unified model achieves accuracy comparable to state-of-the-art lightweight alignment models while maintaining a small footprint.