Looking to Personalize Gaze Estimation Using Transformers

Published: 01 Jan 2023, Last Modified: 15 May 2025J. Comput. Sci. Eng. 2023EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Anatomical differences between people restrain the accuracy of appearance-based gaze estimation. These differences can be taken into account with few-shot approaches for further optimization. However, these approaches come with additional computational complexity cost and are vulnerable to corrupt data inputs. Consequently, the use of accurate gaze estimation in real-world scenarios is restricted. To solve this problem, we introduce a novel and robust gaze estimation calibration framework called personal transformer-based gaze estimation (PTGE), utilizing a deep learning network that is separate from the gaze estimation model to adapt to new users. This network learns to model and estimate person-specific differences in gaze estimation as a low-dimensional latent vector from image features, head pose information, and gaze point labels. The expensive computational optimization process in few-shot approaches is removed in PTGE through our separate network. This separate network is composed of transformers, allowing self-attention to weigh the quality of calibration samples and mitigate the negative effects of corrupt inputs. PTGE achieves near state-of-the-art performance of 1.49 cm on GazeCapture with a small number of calibration samples (≤16) and no optimization when adapting to a new user, only a 2% decrease from the state-of-the-art achieved without the hour-long optimization process.
Loading