Abstract: Driver’s eye gaze holds a wealth of cognitive and inten- tional cues crucial for intelligent vehicles. Despite its sig- nificance, research on in-vehicle gaze estimation remains limited due to the scarcity of comprehensive and well- annotated datasets in real driving scenarios. In this pa- per, we present three novel elements to advance in-vehicle gaze research. Firstly, we introduce IVGaze, a pioneering dataset capturing in-vehicle gaze, collected from 125 sub- jects and covering a large range of gaze and head poses within vehicles. In this dataset, we propose a new vision- based solution for in-vehicle gaze collection, introducing a refined gaze target calibration method to tackle annota- tion challenges. Second, our research focuses on in-vehicle gaze estimation leveraging the IVGaze. In-vehicle face im- ages often suffer from low resolution, prompting our in- troduction of a gaze pyramid transformer that leverages transformer-based multilevel features integration. Expand- ing upon this, we introduce the dual-stream gaze pyramid transformer (GazeDPTR). Employing perspective transfor- mation, we rotate virtual cameras to normalize images, uti- lizing camera pose to merge normalized and original im- ages for accurate gaze estimation. GazeDPTR shows state- of-the-art performance on the IVGaze dataset. Thirdly, we explore a novel strategy for gaze zone classification by extending the GazeDPTR. A foundational tri-plane and project gaze onto these planes are newly defined. Leverag- ing both positional features from the projection points and visual attributes from images, we achieve superior perfor- mance compared to relying solely on visual features, sub- stantiating the advantage of gaze estimation. The project is available at https://yihua.zone/work/ivgaze.
Loading