\begin{abstract}
%Recent self-supervised learning (SSL) methods have shown promising results on image classification tasks, mitigating the need of large annotated datasets. SSL methods leverage contrastive learning and encourage invariance amongst representations by using two distorted views of the data. However, these methods are not directly transferable to 3D regression tasks such as gaze estimation. 
% Self-supervised learning (SSL) has become prevalent for learning representations in computer vision. Notably, SSL exploits contrastive learning to encourage visual representations to be invariant under various image transformations. 
% The task of gaze estimation, on the other hand, demands not just invariance to various appearances but also equivariance to the geometric transformations. In this work, we propose a simple contrastive representation learning framework for gaze estimation, named it as \textit{Gaze Contrastive Learning (\gazeclr{})}. \gazeclr{} exploits multi-view data to promote equivariance and relies on selected data augmentation techniques that do not alter gaze directions for invariance learning. Extensive experiments demonstrate the effectiveness of our method on several settings of gaze estimation. In particular, our results show that \gazeclr{} is effective for improving the performance of cross-domain gaze estimation and yield as high as $17.2\%$ relative improvement. Furthermore, our \gazeclr{} framework is also competitive to state-of-the-art representation learning methods in a few-shot experimental setting. 



Self-supervised learning (SSL) has become prevalent for learning representations in computer vision. Notably, SSL exploits contrastive learning to encourage visual representations to be invariant under various image transformations. The task of gaze estimation, on the other hand, demands not just invariance to various appearances but also equivariance to the geometric transformations. In this work, we propose a simple contrastive representation learning framework for gaze estimation, named \textit{Gaze Contrastive Learning (\gazeclr{})}. \gazeclr{} exploits multi-view data to promote equivariance and relies on selected data augmentation techniques that do not alter gaze directions for invariance learning. Our experiments demonstrate the effectiveness of \gazeclr{} for several settings of the gaze estimation task. Particularly, our results show that \gazeclr{} improves the performance of cross-domain gaze estimation and yields as high as $17.2\%$ relative improvement. Moreover, the \gazeclr{} framework is competitive with  state-of-the-art representation learning methods for few-shot evaluation. The code and pre-trained models are available at \url{https://github.com/jswati31/gazeclr}.






%It's fine if abstract is abstract (i.e., not complete)

% Extensive experiments evaluate the effectiveness of learned representations on several standard experimental settings, namely, few-shot learning, transfer-learning across cross-datasets, and with-in dataset generalization. 
% Our results show that  \gazeclr{} can attain improvements as high as $3.9$  degrees in the absolute gaze error and $55\%$ relative improvement in the gaze accuracy.

% Concretely, \gazeclr{} employs equivariance to exploit the untapped potential of multi-view data arising from video, and additionally, relies on selected data augmentation techniques that do not alter gaze directions. 
%therefore we design a contrastive loss such that \gazeclr{} induces equivariance in the representations. 
%We also demonstrate the complementary strengths of invariance and equivariance properties in improving the performance of both  image-based and video-based gaze estimation. 
% Extensive experiments evaluate the effectiveness of learned representations on several standard experimental settings, namely, few-shot learning, transfer-learning across cross-datasets, and with-in dataset generalization. 
% Our results show that  \gazeclr{} can attain improvements as high as $3.9$  degrees in the absolute gaze error and $55\%$ relative improvement in the gaze accuracy.
%  \my{Revised. Please check}

% To the best of our knowledge, we are first to explore multi-view self supervised learning for the task of gaze estimation.

% enjoys complementary strengths of both invariance and equivariance properties. We show that inducing both invariance and equivariance can be helpful to improve the quality of representations. 
% We demonstrate the effectiveness of our method on the task of both image-based and video-based gaze estimation. Furthermore, we also evaluate GazeCLR with few-shot gaze estimation for cross-domain datasets. To the best of our knowledge, we are first to explore multi-view self supervised learning for the task of gaze estimation.

% \keywords{Self-supervised Learning; Gaze Estimation}
\end{abstract}
