Self-Attention with Convolution and Deconvolution for Efficient Eye Gaze Estimation from a Full Face Image

Juno Oh, Hyung Jin Chang, Sang-Il Choi

05 Nov 2022OpenReview Archive Direct UploadReaders: Everyone

Abstract: This paper proposes a whole new face image-based eye gaze estimation network to solve low generalization perfor- mance. Due to the high variance of facial appearance and environmental conditions, conventional methods in gaze es- timation have low generalization performance and are eas- ily overfitted to training subjects. To solve this problem, we adopt a self-attention mechanism that has better general- ization performance. Nevertheless, applying self-attention directly to an image incurs a high computational cost. Thus, we introduce a new projection that uses convolution in the entire face image to accurate model the local context and reduce the computational cost of self-attention. The pro- posed model also includes deconvolution that transforms the down-sampled global context to the same size as the input so that spatial information is not lost. We confirmed through observations that the new method achieved state of the art on the EYEDIAP, MPIIFaceGaze, Gaze360 and RT-GENE datasets and achieved a performance increase of 0.02° to 0.30° compared to the other state of the art model. In addition, we show the generalization performance of the proposed model through a cross-dataset evaluation.

0 Replies