Domain-Generic Pre-Training for Low-Cost Entity Matching via Domain Alignment and Domain Antagonism

Published: 01 Jan 2023, Last Modified: 17 Apr 2025IJCNN 2023EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Entity matching (EM) is a core problem of data mining and data integration. Existing EM solutions achieve great successes by designing and training the deep learning model for the specific domain, e.g., watches and shoes. However, these methods require a high training cost (e.g., model engineering and data preprocessing) in realistic EM applications. In this paper, we develop a deep learning-based solution in the manner of the domain-generic pre-training that targets low training cost for EM through a novel combination of the domain alignment and domain antagonism. In domain alignment, we design a novel contrastive learning method that align the representation distribution of different domains. And in domain antagonism, we conduct the domain adversarial training to force the encoder to focus the domain-generic knowledge. These two optimizations ensure that the pre-trained EM model can capture the general matching knowledge and be fine-tuned into specific domains at a fairly low cost. Empirical evaluation demonstrates that this combination achieves state-of-the-art performance in both in-domain and out-of-domain settings.
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview