Robust landmark-free head pose estimation by learning to crop and background augmentation

Aoru Xue, Kai Sheng, Songmin Dai, Xiaoqiang Li

Published: 2020, Last Modified: 24 Oct 2025IET Image Process. 2020EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: It is well known that the performance of head pose estimation is greatly affected by the bounding box margin of the face and its background. Traditionally, researchers will manually choose a suitable bounding box margin to strike a balance between ensuring sufficient information and minimising background noise. However, head pose estimation is still worse when the background is complex in reality or when the box margin changes slightly. To make estimation results more robust, the authors propose two methods to improve it: (i) a convolutional cropping module that can learn to crop the input image to an attentional area for head pose regression. (ii) Background augmentation that can make the network more robust to the background noise. Rather than using the face landmarking to calculate head pose angles, they use another convolutional neural network to regress the head pose angles, which is independent of the landmark detection results. They evaluate the method on BIWI and AFLW2000 dataset and experimental results show that their approach outperforms many other methods. Besides, they evaluate the method on Pointing′04 dataset using head pose accuracy. Furthermore, the approach is more robust and has a lower variance in realistic scenarios.