Spatio-temporal eye contact detection combining CNN and LSTMDownload PDFOpen Website

Published: 2019, Last Modified: 06 Nov 2023MVA 2019Readers: Everyone
Abstract: Eye contact (mutual gaze) is fundamental for human communication and social interactions; therefore, it is studied in many fields. To support the study of eye contact, much effort has been made to develop automated eye-contact detection using image recognition techniques. In recent years, convolutional neural network (CNN) based eye-contact detection techniques are becoming popular due to their performance; however, they mainly use single frame for recognition. Eye contact is a human communication behavior, so temporal information, such as temporal eye images and facial poses, is important to increase the accuracy of eye-contact detection. We incorporate temporal information into eye-contact detection by using temporal neural network structures that combine CNNs and long short-term memory (LSTM). We tested several network combinations of CNNs and LSTM and found the best solution that uses the outputs of CNNs as well as the cell state vectors of LSTM in the fully connected layers. We prepared two types of eye contact video datasets. One dataset is based on online videos, and the other was taken by a first-person camera in assumed conversational scenarios. The results show that our method is better than the approaches that use single frames. Namely, our method performs 0.8781, while the existing method (DeepEC) performed 0.8319, in F <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1</sub> -score.
0 Replies

Loading