Does Pixel Value Represent Facial Landmark Well in Heatmap?

Xing Lan, Jiayi Lyu, Kun Dong, Hanyu Jiang, Qinghao Hu, Jian Xue

Published: 01 Jan 2024, Last Modified: 01 Aug 2025IEEE Trans. Circuits Syst. Video Technol. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Heatmap-based methods have dominated the face alignment task, yet the maximum response decoding scheme necessitates further reform. While some studies have attempted to compensate for prediction offsets using a post-processing module, the prediction errors induced by the maximum response decoding scheme remain challenging to rectify. In this paper, we assume that using heatmap value to denote the ground-truth probability is not accurate enough. To cure this problem, we propose DISPAL, a novel DIStribution-based Probability for fAcial Landmarks, which signifies the ground-truth probability by the similarity between the pixel’s neighbouring value distribution and Gaussian distribution. This innovative probability enables us to pinpoint the keypoint location more robustly than previous methods that rely solely on the peak score. It also exhibits remarkable generalization to complex decoding methodologies. Furthermore, we propose supervising this probability as an additional task loss to help the model learn better heatmap representation. Extensive empirical results on WFLW, 300W, and COFW datasets demonstrate that our distribution-based probability mechanism significantly surpasses original value-based probability approaches.