Keywords: Appearance based gaze estimation
TL;DR: We propose a mechanism to replace the procedure of image normalization, or rectification, commonly adopted for gaze estimation, and we combine it with multi-task learning of head-pose to remedy the lack of explicit hpe performed during rectification.
Abstract: Recent advances in appearance-based gaze estimation have adopted deep learning models to directly map face images to 3D gaze directions, but most existing methods rely on face normalization processes, which are costly and error-prone in unconstrained environments.
While normalization-free approaches have been explored to address these challenges, they either discard the advantages of normalization in reducing appearance variability or lack a systematic understanding of the transformations involved.
We revisit this problem and formalize crop-based gaze estimation through Constrained Rotation Optimization (CROp), which models face cropping as a virtual camera rotation and defines a consistent mapping between crop and camera coordinates.
We further adopt multi-task learning to jointly estimate gaze and head pose, improving robustness without requiring explicit landmark-based preprocessing.
Through extensive evaluation, we show that crop-based estimation, when treated rigorously, is a reliable alternative to normalization, especially under extreme head poses and noisy preprocessing.
Our analysis highlights the trade-offs between the two approaches and offers practical guidelines for effective and robust gaze estimation in real-world, unconstrained settings.
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 11779
Loading