Learning to estimate smooth and accurate semantic correspondence

Huaiyuan Xu, Xiaodong Chen, Jiaqi Xi, Jing Liao

Published: 01 Jan 2021, Last Modified: 27 Sept 2024Neurocomputing 2021EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: We tackle the problem of estimating dense semantic correspondence between two images depicting different instances of the same category. In this paper, we consider semantic context and correspondence information from the neighborhood in order to overcome the drawback of previous works that estimate the correspondence of each pixel or patch independently. To this end, a novel network, called SANet, with a trainable spatial aggregation module is proposed, which is trained in an end-to-end manner and outputs semantic flow. We train this SANet by adopting two complementary loss terms: landmark loss, focusing on keypoints with ground truth, and consistency loss, applicable to all pixels without ground truth. Qualitative and quantitative experimental results demonstrate the improved network achieves a better balance between accuracy and smoothness comparing with the baseline and warps images with better visual quality.