Keywords: Category Agnostic Pose Estimation, Keypoint Localization, Few Shot Learning, 2D Pose Estimation
TL;DR: This paper introduces EdgeCape, a graph-based approach for category-agnostic pose estimation that predicts category-agnostic pose-graphs to achieve improved accuracy
Abstract: Category-Agnostic Pose Estimation (CAPE) localizes keypoints across diverse object categories with a single model, using one or few annotated support images. Recent works have shown that using a pose-graph (i.e., treating keypoints as nodes in a graph rather than isolated points) helps handle occlusions and break symmetry. However, these methods assume a given pose-graph with equal-weight edges, leading to suboptimal results. We introduce EdgeCape, a novel framework that overcomes these limitations by predicting the graph's edge weights in order to optimize localization. To further leverage structural (i.e., graph) priors, we propose integrating Markov Attention Bias, which modulates the self-attention interaction between nodes based on the number of hops between them. We show that this improves the model’s ability to capture global spatial dependencies. Evaluated on the MP-100 benchmark, which includes 100 categories and over 20K images, EdgeCape achieves state-of-the-art results in the 1-shot and 5-shot settings, significantly improving keypoint localization accuracy. Our code will be publicly available.
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 13208
Loading