VGGT-HPE: Reframing Head Pose Estimation as Relative Pose Prediction

Vasiliki Vasileiou; Panagiotis Filntisis; Petros Maragos; Kostas Daniilidis

VGGT-HPE: Reframing Head Pose Estimation as Relative Pose Prediction

Vasiliki Vasileiou, Panagiotis Filntisis, Petros Maragos, Kostas Daniilidis

Published: 02 Jun 2026, Last Modified: 02 Jun 2026Greeks in AI 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Head pose estimation, Relative pose estimation, Geometry foundation models, Synthetic-to-real transfer, VGGT

Domains: Vision and Learning, Robotics

TL;DR: CVPRW 2026 (ABAW)

External Link: https://vasilikivas.github.io/VGGT-HPE/

Abstract: Monocular head pose estimation is traditionally formulated as direct regression from a single image to an absolute pose. This paradigm forces the network to implicitly internalize a dataset-specific canonical reference frame. In this work, we argue that predicting the relative rigid transformation between two observed head configurations is a fundamentally easier and more robust formulation. We introduce VGGT-HPE, a relative head pose estimator built upon a general-purpose geometry foundation model. Finetuned exclusively on synthetic facial renderings, our method sidesteps the need for an implicit anchor by reducing the problem to estimating a geometric displacement from an explicitly provided anchor with a known pose. As a practical benefit, the relative formulation also allows the anchor to be chosen at test time — for instance, a near-neutral frame or a temporally adjacent one — so that the prediction difficulty can be controlled by the application. Despite zero real-world training data, VGGT-HPE achieves state of-the-art results on the BIWI benchmark, outperforming established absolute regression methods trained on mixed and real datasets. Through controlled easy- and hard-pair benchmarks, we also systematically validate our core hypothesis: relative prediction is intrinsically more accurate than absolute regression, with the advantage scaling alongside the difficulty of the target pose. Project page and code: https://vasilikivas.github.io/VGGT-HPE

Submission Number: 23

Loading