PRISM: PRogressive dependency maxImization for Scale-invariant image Matching

Published: 20 Jul 2024, Last Modified: 21 Jul 2024MM2024 PosterEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Image matching aims at identifying corresponding points between a pair of images. Currently, detector-free methods have shown impressive performance in challenging scenarios, thanks to their capability of generating dense matches and global receptive field. However, performing feature interaction and proposing matches across the entire image is unnecessary, as not all image regions contribute beneficially to the matching process. Interacting and matching in unmatchable areas can introduce errors, reducing matching accuracy and efficiency. Furthermore, the scale discrepancy issue still troubles existing methods. To address above issues, we propose PRogressive dependency maxImization for Scale-invariant image Matching (PRISM), which jointly prunes irrelevant patch features and tackles the scale discrepancy. To do this, we first present a Multi-scale Pruning Module (MPM) to adaptively prune irrelevant features by maximizing the dependency between the two feature sets. Moreover, we design the Scale-Aware Dynamic Pruning Attention (SADPA) to aggregate information from different scales via a hierarchical design. Our method's superior matching performance and generalization capability are confirmed by leading accuracy across various evaluation benchmarks and downstream tasks. The code will be publicly available.
Primary Subject Area: [Content] Media Interpretation
Secondary Subject Area: [Content] Media Interpretation
Relevance To Conference: Our submission, PRISM: PRogressive dependency maxImization for Scale-invariant image Matching substantially advances multimedia content analysis by improving process of image matching. PRISM specifically addresses the challenges of pruning irrelevant features and scale discrepancies, common in multimedia applications. By selectively pruning irrelevant features and intelligently integrating information across different scales, our approach not only boosts the precision of image matching but also enhances its applicability in multimodal contexts such as Augmented Reality, multimedia retrieval, and 3D reconstruction. These advancements demonstrate PRISM’s significant contributions to the field, ensuring robust and scalable solutions for complex multimedia processing tasks.
Supplementary Material: zip
Submission Number: 3020
Loading