Bootstrap AutoEncoders With Contrastive Paradigm for Self-supervised Gaze Estimation

Yaoming Wang; Jin Li; Wenrui Dai; Bowen Shi; XIAOPENG ZHANG; Chenglin Li; Hongkai Xiong

Bootstrap AutoEncoders With Contrastive Paradigm for Self-supervised Gaze Estimation

Yaoming Wang, Jin Li, Wenrui Dai, Bowen Shi, XIAOPENG ZHANG, Chenglin Li, Hongkai Xiong

Published: 02 May 2024, Last Modified: 25 Jun 2024ICML 2024 PosterEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Existing self-supervised methods for gaze estimation using the dominant streams of contrastive and generative approaches are restricted to eye images and could fail in general full-face settings. In this paper, we reveal that contrastive methods are ineffective in data augmentation for self-supervised full-face gaze estimation, while generative methods are prone to trivial solutions due to the absence of explicit regularization on semantic representations. To address this challenge, we propose a novel approach called **B**ootstrap auto-**e**ncoders with **C**ontrastive p**a**radigm (**BeCa**), which combines the strengths of both generative and contrastive methods. Specifically, we revisit the Auto-Encoder used in generative approaches and incorporate the contrastive paradigm to introduce explicit regularization on gaze representation. Furthermore, we design the InfoMSE loss as an alternative to the vanilla MSE loss for Auto-Encoder to mitigate the inconsistency between reconstruction and representation learning. Experimental results demonstrate that the proposed approaches outperform state-of-the-art unsupervised gaze approaches on extensive datasets (including wild scenes) under both within-dataset and cross-dataset protocols.

Submission Number: 1193

Loading