Towards Unified Dynamic Face Landmark Detection

Towards Unified Dynamic Face Landmark Detection

ICLR 2026 Conference Submission18868 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: face landmark detection

TL;DR: We achieve (1) the ability to train a face landmark model on multiple datasets with different face landmark layouts and (2) unlimited on-demand landmark prediction. Our focus is not to outperform the SOTA but to competitively offer these benefits.

Abstract: Although advancements in face landmark detection (FLD) methods continue to push performance boundaries, they overlook two major functional limitations: (1) different network parameters need to be trained independently for each "N-point" benchmark dataset, and (2) a model trained on an "N-point" dataset reliably outputs only the N landmarks. In our work, we first conceptualize Face Part-Anchored Landmark Positions (FPALPs), wherein each landmark is treated as a progression value between zero (start) and one (end) along a face part's contour. Every landmark can be expressed in the FPALP format, irrespective of its source dataset, hence unlocking the ability to unify all "N-point" datasets into a single dataset. Secondly, we represent each landmark with an FPALP-based query, refine it progressively with a cross-modality decoder, and predict its coordinates based on the final representation. Our approach, called Unified Dynamic FLD, embodies these two design choices and streamlines the landmark detection pipeline by enabling (1) a single model to learn on any number of "N-point" datasets, and (2) yield any number of specific landmark predictions by loading the designated landmark queries at runtime. Extensive experiments carried out on several benchmark datasets demonstrate that our approach can achieve the above benefits while performing competitively with, if not better than, existing SOTA methods on individual- and cross-dataset evaluations.

Primary Area: applications to computer vision, audio, language, and other modalities

Submission Number: 18868

Loading