Abstract: As autonomous driving technologies advance, occupants are expected to be free from driving, diversifying interaction scenarios with vehicles. Despite the growing importance of in-vehicle occupant monitoring systems, most existing systems focus on the face or head tracking of occupants, and only a few studies have attempted to detect their poses. In this paper, we present the first in-vehicle environment-specialized framework for the joint estimation of 3D human pose and shape from a single image. To this end, we introduce a new dataset called Human In VEhicles (HIVE), which contains a large collection of synthesized humans with different shapes and poses in vehicle images. HIVE provides RGB and NIR in-vehicle image pairs with ground-truth 2D and 3D pose and shape annotations, respectively. In addition, to exploit the different characteristics of humans in vehicles and unconstrained environments, we present a new pose prior penalizing poses that deviate from in-vehicle poses. The pose prior is derived using a variational autoencoder trained with in-vehicle human pose data. By using the proposed HIVE dataset and pose prior along with an elaborately designed two-stage training procedure, our method exhibits significantly improved pose and shape estimation performance compared with state-of-the-art methods for real-world test images captured in vehicles under different conditions.
Loading