Through the eyes of the beholder – a study on zero-shot digitization of lab measurements

Through the eyes of the beholder – a study on zero-shot digitization of lab measurements

NeurIPS 2023 Workshop Gaze Meets ML Submission16 Authors

07 Oct 2023 (modified: 27 Oct 2023)Submitted to Gaze Meets ML 2023EveryoneRevisionsBibTeX

Keywords: Gaze, computer vision, lab automation

Abstract: The automatic tracking of human actions in scientific pipelines can enormously improve their reproducibility and reliability, ensuring accountability, quality control, traceability, and replicability of protocols. Self-recording is a non-intrusive strategy to create a visual diary, and egocentric video recordings store detailed information about the actions performed without influencing their ordinary course. However, raw videos generate massive amounts of unlabelled data that are challenging to index and use for information retrieval. In this paper, we study how gaze information can be beneficial to analyzing egocentric video recordings and automatically extracting accurate measurements about first-person procedures and manual operations. We propose a novel approach that uses gaze tracking to perform a fine-grained segmentation of the raw video content at the temporal and spatial levels. Based on gaze-driven segmentation, we then devise a methodology to extract precise quantitative information about two types of human actions: the measurement of a liquid volume and the weighting of an object on the scale. Both actions are examples of repetitive measurements performed in a laboratory, requiring high reproducibility.Results show that gaze has apparent benefits in terms of temporal segmentation and computational costs of information extraction. With this, we wish to open the discussion on gaze-based prompting to obtain real-world measurements from unlabelled egocentric video recordings, leveraging the recent advances in foundation models for image segmentation. Our proposed examples show how the synergies between gaze estimation and computer vision can facilitate the annotation of precise information, and we foresee that they will facilitate the natural interaction of human or robotic operators in scientific environments.

Submission Type: Full Paper

Submission Number: 16

Loading