Temporal Understanding of Gaze Communication with GazeTransformer

Ryan Anthony de Belen; Gelareh Mohammadi; Arcot Sowmya

Temporal Understanding of Gaze Communication with GazeTransformer

Ryan Anthony de Belen, Gelareh Mohammadi, Arcot Sowmya

Published: 27 Oct 2023, Last Modified: 21 Nov 2023Gaze Meets ML 2023 PosterEveryoneRevisionsBibTeX

Submission Type: Full Paper

Keywords: Gaze estimation and prediction, gaze communication behaviour prediction

TL;DR: Our work addresses the problem of temporal understanding of gaze communication.

Abstract: Gaze plays a crucial role in daily social interactions as it allows humans to communicate intentions effectively. We address the problem of temporal understanding of gaze communication in social videos in two stages. First, we develop GazeTransformer, an end-to-end module that infers atomic-level behaviours in a given frame. Second, we develop a temporal module that predicts event-level behaviours in a video using the inferred atomic-level behaviours. Compared to existing methods, GazeTransformer does not require human head and object locations as input. Instead, it identifies these locations in a parallel and end-to-end manner. In addition, it can predict the attended targets of all predicted humans and infer more atomic-level behaviours that cannot be handled simultaneously by previous approaches. We achieve promising performance on both atomic- and event-level prediction on the (M)VACATION dataset. Code will be available at https://github.com/gazetransformer/gazetransformer.

Supplementary Material: zip

Submission Number: 5

Loading