Abstract: Learning-based feature descriptors have been dominantly popular for their notable performance on feature matching tasks along with the rapid development of convolutional neural networks (CNNs). However, existing popular learning-based methods predict discriminative description solely using the high-level features from the last layer of deep CNNs while neglecting the rich complementary clues hidden in intermediary multilevel features, which could further promote the discriminative power by introducing the implicit hierarchical comparison into descriptor space. This hinders the optimization of learned descriptors and limits their performance on real-world visual measurement tasks. In this regard, we propose hierarchical view consistency (HVC) for fully leveraging the complementary information of multilevel features. Specifically, we first present a novel multiviewer neural network (MVNet), which benefits from multiple viewers with local-to-global receptive fields and efficiently generates dense descriptions in a coarse-to-fine manner. Next, we introduce the HVC, i.e., ensuring consistent yet diverse hierarchical features between views, to encourage viewers to encode as hierarchical features as possible while increasing the hierarchical similarity for reliable matches. With our proposed triplet training strategy, MVNet leverages the rich hierarchical complementary clues in multilevel features and efficiently predicts strong discriminative descriptions. Our experiments on feature matching and challenging visual measurement tasks of visual localization and visual 3-D reconstruction demonstrate that our proposed descriptor is efficient and generalizes well to various scenarios.
0 Replies
Loading