Abstract: It is crucial to identify and resolve the inconsistencies and conflicts in data. To tackle the inconsistencies, integrity constraints are involved to constrain the attribute values of related entities. As for the multi-source conflicts, the true values of each entity are identified by trusting the reliable sources. In practice, it is common that inconsistencies and conflicts simultaneously appear. To deal with this case, traditional techniques would separately resolve the inconsistencies and conflicts, by conducting different approaches based on the above principles. However, such a procedure may not be the appropriate solution. Specifically, locally resolving conflicts for a certain entity may overlook the information from its related entities, while enforcing constraints on related entities may miss correct values of these entities in turn. To jointly resolve the inconsistencies and conflicts, this paper proposes a novel technique powered by integrity constraints and source reliability. The key component of our solution is to incorporate denial constraints, an expressive type of integrity constraint, into the process of conflict resolution. We formulate it as an optimization problem and develop an iterative algorithm to solve it. Benefiting from this algorithm, the repaired result is not only supported by reliable sources but also satisfies the denial constraints. Additionally, we also propose two optimal strategies to ensure that it is scalable under massive constraints. Experimental results on real-world datasets demonstrate the high accuracy and scalability of the proposed approach.
0 Replies
Loading