Abstract: Entity matching (EM) is a core problem of data mining and data integration. Existing EM solutions achieve great successes by designing and training the deep learning model for the specific domain, e.g., watches and shoes. However, these methods require a high training cost (e.g., model engineering and data preprocessing) in realistic EM applications. In this paper, we develop a deep learning-based solution in the manner of the domain-generic pre-training that targets low training cost for EM through a novel combination of the domain alignment and domain antagonism. In domain alignment, we design a novel contrastive learning method that align the representation distribution of different domains. And in domain antagonism, we conduct the domain adversarial training to force the encoder to focus the domain-generic knowledge. These two optimizations ensure that the pre-trained EM model can capture the general matching knowledge and be fine-tuned into specific domains at a fairly low cost. Empirical evaluation demonstrates that this combination achieves state-of-the-art performance in both in-domain and out-of-domain settings.
Loading