Robust Entity Resolution using Random GraphsOpen Website

2018 (modified: 19 Feb 2025)SIGMOD Conference 2018Readers: Everyone
Abstract: Entity resolution (ER) seeks to identify which records in a data set refer to the same real-world entity. Given the diversity of ways in which entities can be represented, matched and distinguished, ER is known to be a challenging task for automated strategies, but relatively easier for expert humans. In our work, we abstract the knowledge of experts with the notion of a binary oracle. Our oracle can answer questions of the form "do records u and v refer to the same entity?" under a flexible error model, allowing for some questions to be more difficult to answer correctly than others. Our contribution is a general error correction tool that can be leveraged by a variety of hybrid-human machine ER algorithms, based on a formal way for selecting indirect "control queries''. In our experiments we demonstrate that correction-less ER algorithms equipped with our tool can perform even better than recent ER algorithms specifically designed for correcting errors. Our control queries are selected among those that provide strongest connectivity between records of each cluster, based on the concept ofgraph expanders (which are sparse graphs with formal connectivity properties). We give formal performance guarantees for our toolkit and provide experiments on real and synthetic data.
0 Replies

Loading