QA-Matcher: Unsupervised Entity Matching Using a Question Answering Model

Published: 01 Jan 2023, Last Modified: 18 Jun 2024PAKDD (4) 2023EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Entity matching (EM) is a fundamental task in data integration, which involves identifying records that refer to the same real-world entity. Unsupervised EM is often preferred in real-world applications, as labeling data is often a labor-intensive process. However, existing unsupervised methods may not always perform well because the assumptions for these methods may not hold for tasks in different domains. In this paper, we propose QA-Matcher, an unsupervised EM model that is domain-agnostic and doesn’t require any particular assumptions. Our idea is to frame EM as question answering (QA) by utilizing a trained QA model. Specifically, we generate a question that asks which record has the characteristics of a particular record and a passage that describes other records. We then use the trained QA model to predict the record pair that corresponds to the question-answer as a match. QA-Matcher leverages the power of a QA model to represent the semantics of various types of entities, allowing it to identify identical entities in a QA-like fashion. In extensive experiments on 16 real-world datasets, we demonstrate that QA-Matcher outperforms unsupervised EM methods and is competitive with supervised methods.
Loading