Keywords: observational supervision, eye tracking, gaze data, limited labeled data
TL;DR: We explore using passively collected eye-tracking data to reduce the amount of labeled data needed during training.
Abstract: Supervised machine learning models for high-value computer vision applications such as medical image classification often require large datasets labeled by domain experts, which are slow to collect, expensive to maintain, and static with respect to changes in the data distribution. In this context, we assess the utility of observational supervision, where we take advantage of passively-collected signals such as eye tracking or “gaze” data, to reduce the amount of hand-labeled data needed for model training. Specifically, we leverage gaze information to directly supervise a visual attention layer by penalizing disagreement between the spatial regions the human labeler looked at the longest and those that most heavily influence model output. We present evidence that constraining the model in this way can reduce the number of labeled examples required to achieve a given performance level by as much as 50%, and that gaze information is most helpful on more difficult tasks.