Unsupervised Domain Adaptation for Entity Blocking Leveraging Large Language Models

Published: 01 Jan 2024, Last Modified: 21 May 2025IEEE Big Data 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Entity blocking, which aims to find all potentially matched tuple pairs in large-scale data, is an important step for entity resolution. It is non-trivial because it needs to consider both of the effectiveness and efficiency, and the emergence of representation learning has made it possible. Although there exist existing representation learning models for entity blocking, all of them require self-curated training instances in the target domain, which limits their capabilities for unseen data. In this paper, we propose UDAEB, a framework for Unsupervised Domain Adaptation for Entity Blocking that is fine-tuned between the source and target domains using contrastive learning by leveraging the capabilities of LLMs. UDAEB first adopts the adversarial learning strategy to reduce the distribution discrepency between source and target domains as the warmup step. Based on the initially learned representations, UDAEB involves pre-trained LLMs to enrich robust and distinguishable attributes for source and target domains. Furthermore, we propose an iterative step to fine-tune entity blocking model by selecting high-quality training instances with pseudo-labels by leveraging LLMs. Finally we conduct comprehensive experiments to show UDAEB has the superior performance against the state-of-the-art algorithms with aspects of the pair completeness (PC), pair quality (PQ) and the candidate set size ratio (CSSR).
Loading