Paper Link: https://openreview.net/forum?id=xaWIeItQ7zb
Paper Type: Long paper (up to eight pages of content + unlimited references and appendices)
Abstract: Recent work has shown that the task of entity resolution (ER) can be effectively performed by gradual machine learning (GML). GML begins with some easy instances, which can be automatically labeled by the machine with high accuracy, and then gradually labels more challenging instances by iterative knowledge conveyance in a factor graph. Without involving manual labeling effort, the current GML solution for ER is unsupervised. However, its performance is limited by inaccurate and insufficient knowledge conveyance. Therefore, there is a need to investigate how to improve knowledge conveyance by manual labeling effort. In this paper, we propose an active learning (AL) approach based on GML for ER. It iteratively generates new knowledge in the form of one-sided rules by manual label verification and instills them into a factor graph for improved knowledge conveyance. We first present a technique of knowledge discovery based on genetic mutations, which can generate effective knowledge rules with very small manual verification cost. Then, we demonstrate how to leverage the generated rules for improved knowledge conveyance by measuring their influence over label status by the metric of skyline distance. We have evaluated the performance of the proposed approach by a comparative study on real benchmark data. Our extensive experiments have shown that it can significantly improve the performance of unsupervised GML with very small manual cost; furthermore, it outperforms the state-of-the-art AL solutions for deep learning by considerable margins in terms of learning efficiency.