From Decoupled to Coupled: Robustness Verification for Learning-based Keypoint Detection with Joint Specifications

From Decoupled to Coupled: Robustness Verification for Learning-based Keypoint Detection with Joint Specifications

TMLR Paper5793 Authors

02 Sept 2025 (modified: 16 Sept 2025)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: Keypoint detection underpins many vision pipelines, from human-pose estimation and viewpoint recovery to 3D reconstruction. Yet, modern neural models remain vulnerable to subtle input variations. Despite its importance, robustness verification for keypoint detection remains largely unexplored due to the high dimensionality of input spaces and the complexity of deep models. In this work, we verify a property that bounds the joint deviation across all keypoints, capturing interdependencies among keypoints specified by system designers or derived from downstream performance requirements (e.g., pose-based error budgets). A few existing approaches reformulate the problem by decoupling each keypoint (or its neighboring pixels) into independent classification tasks, leading to overly conservative guarantees and fails to account for the collective role keypoints play in downstream tasks. We address this gap with the first coupled robustness verification framework for heatmap-based keypoint detectors under joint specifications. Our method supports any backbone architecture (e.g., CNN, ResNet, Transformer) that produces per-keypoint heatmaps, followed by a max-activation operation to extract coordinates. To do so, we combine the reachability and optimization methodologies by formulating robustness verification as a property falsification problem using a Mixed-Integer Linear Program (MILP) that combines (i) reachable sets of heatmap outputs, obtained via existing reachability analysis tools, and (ii) a polytope encoding the joint keypoint deviation constraint. Infeasibility of the MILP certifies robustness, while feasibility yields a potential counterexample. We prove that our method is sound, that is, it certifies robustness only when the property truly holds. Experiments demonstrate that our coupled method achieves a verified rate comparable to the testing-based method when the keypoint error thresholds are not tight. Moreover, under stricter keypoint error thresholds, our method maintains a high verified rate, whereas the decoupled approach fails to verify the robustness of any image in these scenarios.

Submission Length: Long submission (more than 12 pages of main content)

Assigned Action Editor: ~Chinmay_Hegde1

Submission Number: 5793

Loading