Abstract: Entity Resolution (ER) is a core task in data integration and cleaning, but the generalization ability of supervised methods is often limited when labeled data is scarce. One promising direction to overcome this challenge is leveraging large language models (LLMs), which have achieved remarkable progress in text semantic understanding. However, directly applying these models to graph-based entity resolution still faces two major challenges. First, blocking mechanisms for graph data are often inefficient, resulting in high computational costs. Second, graph structural information (e.g., neighbor associations, relation paths) is difficult to effectively inject into the model through natural language prompts. To address these challenges, we propose a Graph-Aware Probabilistic Linking (GAPLink), a novel two-stage dynamic inference framework. In the first stage, we introduce an entropy-driven rule selection mechanism based on lightweight Graph Differential Dependencies (GDDs) to filter out structurally incompatible matching candidates. In the second stage, we develop a rule-prompt co-compilation strategy to explicitly encode graph patterns into LLM prompts, which guide deep semantic matching on pruned subgraphs. We conduct extensive experiments on multiple standard benchmark datasets, covering both relational and graph data. Experimental results show that GAPLink has significant advantages over existing methods, demonstrating strong robustness and generalization in scenarios with missing labels and cross-domain adaptation. Our code and datasets are publicly available.
External IDs:dblp:conf/icic/WangMKBHF25
Loading