IntelliClean: a knowledge-based intelligent data cleanerOpen Website

2000 (modified: 16 Jul 2019)KDD 2000Readers: Everyone
Abstract: data cleaning methods work on the basis of com- puting the degree of similarity between nearby records in a sorted database. High recall is achieved by accepting records with low degrees of similarity as duplicates, at the cost of lower precision. High precision is achieved analogously at the cost of lower recall. This is the recall-precision dilemma. In this paper, we propose a generic knowledge-based frame- work for effective data cleaning that implements existing cleaning strategies and more. We develop a new method to compute transitive closure under uncertainty which handles the merging of groups of inexact duplicate records. Experi- mental results show that this framework can identify dupli- cates and anomalies with high recall and precision.
0 Replies

Loading