Keywords: landmark search, prior knowledge, proximity-weighted contrastive learning
TL;DR: This paper introduces a highly efficient landmark detection algorithm utilizing prior knowledge of landmarks.
Abstract: We present a highly efficient, agent-based framework for facial landmark detection that prioritizes model compactness and computational efficiency over maximum accuracy. Unlike conventional approaches that rely on large, fully supervised models, our method assigns each agent to a specific landmark, enabling it to infer its position solely from local observations and prior knowledge without explicit location awareness or inter-agent communication. Prior knowledge is modeled in two embedding spaces—feature and coordinate—using class-conditional Gaussian distributions. Agents navigate by minimizing deviations from these priors via a lightweight policy network. To enhance representation learning, we introduce a proximity-weighted contrastive learning strategy that incorporates spatial proximity into the training objective. A multi-stage detection strategy further reduces redundant computation by detecting sub-landmarks relative to core landmarks. While our method produces slightly higher normalized mean error than state-of-the-art (SoTA) methods, it achieves over $16\times$ and $41\times$ improvements in space and time complexities, respectively, compared to the SoTA lightweight model, running at $4.19$ and $1.29$ frames per second on an i5 CPU (2.5 GHz) for the COFW and 300W datasets, respectively.
Supplementary Material: zip
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Submission Number: 16871
Loading