Abstract: Keypoint detection is a pivotal step in 3D reconstruction,
whereby sets of (up to) K points are detected in each
view of a scene. Crucially, the detected points need to be
consistent between views, i.e., correspond to the same 3D
point in the scene. One of the main challenges with keypoint
detection is the formulation of the learning objective.
Previous learning-based methods typically jointly learn descriptors
with keypoints, and treat the keypoint detection as
a binary classification task on mutual nearest neighbours.
However, basing keypoint detection on descriptor nearest
neighbours is a proxy task, which is not guaranteed to produce
3D-consistent keypoints. Furthermore, this ties the
keypoints to a specific descriptor, complicating downstream
usage. In this work, we instead learn keypoints directly from
3D consistency. To this end, we train the detector to detect
tracks from large-scale SfM. As these points are often overly
sparse, we derive a semi-supervised two-view detection objective
to expand this set to a desired number of detections.
To train a descriptor, we maximize the mutual nearest
neighbour objective over the keypoints with a separate network.
Results show that our approach, DeDoDe, achieves
significant gains on multiple geometry benchmarks. Code
is provided at github.com/Parskatt/DeDoDe.
Loading