Keywords: human action prediction, human-machine interaction, eye gaze
TL;DR: Regularize the attention in a transformer (VLM) to enhance the accuracy of human action prediction.
Abstract: Eye gaze, encompassing fixations and saccades, offers valuable insights into human intentions and future actions. This study presents a novel approach to enhancing Vision Language Models (VLMs) for human action prediction by integrating eye gaze data into egocentric video analysis. Existing methods for action anticipation in egocentric videos often rely solely on visual data, potentially missing critical information provided by eye gaze. To address this limitation, we propose a unique gaze-augmented framework that integrates eye gaze directly into the VLM architecture and training process. By generating gaze heatmaps from eye gaze coordinates, our model dynamically focuses on regions highlighted by gaze patterns. Additionally, a gaze-regularization mechanism ensures the model maintains attention on gaze-allocated areas, thereby improving prediction accuracy and robustness. Our approach significantly enhances the model's ability to generate precise and detailed predictions of future actions. Compared to baseline models without leveraging gaze data, our method achieves a nearly 13\% improvement in the semantic score of predictions. This substantial improvement underscores the effectiveness and novelty of integrating eye gaze with a gaze-regularized attention mechanism in VLMs for action anticipation. Moreover, our work demonstrates that incorporating eye gaze through this gaze-augmented framework can significantly boost the predictive capabilities of VLMs, enhancing their potential in applications that require accurate human action prediction.
Supplementary Material: zip
Primary Area: foundation or frontier models, including LLMs
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 2329
Loading