SG-Gaze: Structurally and Geometrically Consistent Representation Learning for Generalizable 3D Gaze Estimation

SG-Gaze: Structurally and Geometrically Consistent Representation Learning for Generalizable 3D Gaze Estimation

ICLR 2026 Conference Submission15568 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: 3D Gaze Estimation, Gaze Representation Learning, Eye Model Reconstruction, Geometric and Structural Consistency

TL;DR: A dual-branch framework (SG-Gaze) learns structurally and geometrically consistent gaze representations under view transformations through adversarial alignment, achieving state-of-the-art accuracy and strong cross-domain generalization.

Abstract: Learning accurate and generalizable 3D gaze representations remains challenging due to the lack of a unified and physically meaningful representation. Existing methods rely on either appearance features or simplified geometric modeling, but fail to jointly capture geometric and structural consistency. They exhibit poor cross-domain generalization and typically require large-scale multiview datasets to mitigate viewpoint variation, yet still struggle with domain shifts between controlled and in-the-wild settings. To address these issues, we propose $\textbf{SG-Gaze}$, a dual-branch framework that learns a $\textbf{S}$tructurally and $\textbf{G}$eometrically Consistent $\textbf{R}$epresentation $\textbf{(SGR)}$ for gaze estimation. The analytical branch embeds features into a geodesically aligned spherical manifold for interpretable regression, while the model-guided branch reconstructs 3D eyeball structure under weak 2D edge supervision. Through adversarial training, the resulting SGR is simultaneously appearance discriminative, structurally faithful, and geometrically consistent. To further improve robustness, we introduce View-Consistent Regularization, which augments training SGR with synthetic view perturbations and enforces rotation-equivariant consistency across gaze vectors and structural projections. This reduces reliance on costly multiview data and narrows cross-domain gaps. Extensive experiments on synthetic and real-world datasets show that SG-Gaze achieves state-of-the-art accuracy and strong cross-domain generalization in 12 challenging transfer scenarios. Our work demonstrates the importance of unifying structurally and geometrically consistent representation with equivariant regularization, providing broader insight into building more interpretable and generalizable models.

Supplementary Material: zip

Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning

Submission Number: 15568

Loading