SG-Gaze: Structurally and Geometrically Consistent Representation Learning for Generalizable 3D Gaze Estimation
Keywords: 3D Gaze Estimation, Gaze Representation Learning, Eye Model Reconstruction, Geometric and Structural Consistency
TL;DR: A dual-branch framework (SG-Gaze) learns structurally and geometrically consistent gaze representations under view transformations through adversarial alignment, achieving state-of-the-art accuracy and strong cross-domain generalization.
Abstract: Learning accurate and generalizable 3D gaze representations remains challenging due to the lack of a unified and physically grounded representation. Existing methods rely solely on appearance cues or simplified geometric modeling, and thus fail to jointly capture geometric and structural consistency.
They exhibit poor cross-domain generalization and typically require large-scale multiview datasets to mitigate viewpoint variation, yet still struggle with domain gap between controlled and in-the-wild settings.
To address these issues, we propose $\textbf{SG-Gaze}$, a dual-branch framework that learns a $\textbf{S}$tructurally and $\textbf{G}$eometrically Consistent $\textbf{R}$epresentation $\textbf{(SGR)}$ for gaze estimation.
The analytical branch embeds features onto a geodesically aligned spherical manifold for interpretable regression, while the model-guided branch reconstructs 3D eyeball structure with weak 2D edge supervision. Through adversarial alignment, the resulting SGR is simultaneously appearance discriminative, structurally faithful, and geometrically consistent.
To further improve robustness, we introduce View-Consistent Regularization, which augments training SGR with synthetic view perturbations and enforces rotation-equivariant consistency across gaze vectors and structural projections. This reduces reliance on costly multiview data and mitigates cross-domain distribution shifts.
Extensive experiments on synthetic and real-world datasets show that SG-Gaze achieves state-of-the-art accuracy and strong cross-domain generalization in 12 challenging transfer settings. Our work demonstrates the importance of unifying structural and geometrical consistency with equivariant regularization, providing broader insight into building more interpretable and generalizable models.
Supplementary Material: zip
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Submission Number: 15568
Loading