Global-Local Bayesian Transformer for Semantic CorrespondenceDownload PDF

22 Sept 2022 (modified: 13 Feb 2023)ICLR 2023 Conference Withdrawn SubmissionReaders: Everyone
Abstract: Cost aggregation is the key to finding semantic correspondence between a pair of similar images. Transformer-based cost aggregators have recently shown strong performance in obtaining high-quality correlation maps due to their capability of capturing long-range dependencies between matching points. However, such models are data-hungry and prone to over-fitting when training data is not sufficiently large. Besides, they easily incur incorrect matches when finding correspondences in the local semantic context. To address these issues, we propose a Global-Local Bayesian Transformer (GLBT) for cost aggregation. Specifically, GLBT introduces one global Bayesian self-attention module, whose weights are sampled from a learnable Bayesian posterior distribution, to mitigate over-fitting while modeling the long-range interaction from correlation maps. Furthermore, to model the short-range interaction between candidate matches, GLBT introduces another local Bayesian self-attention module, which factorizes both correlation maps and Bayesian attention weights into pairs of patches and conducts a matrix multiplication on individuals rather than a direct dot-product. Two self-attention modules are joined together to model the long-range and short-range interactions from correlation maps. Ultimately, GLBT is hierarchically aggregated for the refinement of correlation maps before feeding it to the flow estimator. We conduct extensive experiments to show the superiority of our proposed network to the state-of-the-art methods on datasets, including SPair-71k, PF-PASCAL, and PF-WILLOW.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Submission Guidelines: Yes
Please Choose The Closest Area That Your Submission Falls Into: Applications (eg, speech processing, computer vision, NLP)
Supplementary Material: zip
4 Replies

Loading