Experimental Evaluation Among Reblocking Techniques Applied to the Entity Resolution

Published: 2021, Last Modified: 21 Jan 2026ADBIS 2021EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Entity Resolution (ER) is an essential task in the data integration process, by identifying records that refer to the same object in the real world. In a naive approach, ER needs to compare all pairs of records in a dataset. This process has a high cost, especially for large-scale datasets. Several techniques have been proposed in the literature to restrict the comparison among records grouped in the same blocks to mitigate such a cost. In order to further reduce the number of comparisons, some approaches, named reblocking, focus on blocking reprocessing. The reblocking techniques include two major groups: meta-blocking and filtering. Meta-blocking reduces the number of comparisons based on blocks shared by the records. On the other hand, filtering focuses on providing pairs of records for comparison based on the degree of similarity between them. Although both approaches have the same goal, as far as we know, no work in the literature experimentally compares the reblocking techniques. Filling this gap, in this research, we present a qualitative and comparative analysis of techniques in the state-of-the-art of reblocking approaches. With this analysis, we provide different characteristics to assess issues of effectiveness and efficiency of the techniques. Finally, we specify appropriate scenarios for each evaluated technique.
Loading