Abstract: In relation extraction tasks, distant supervision methods expand dataset by aligning entity pairs in different knowledge bases and completing the relations between two entities. However, these methods ignore the fact that sentences labels generated by distant supervision methods with high confidence are often incorrect in the real world called Unknown Unknowns (UUs). To deal with this challenge, we propose a crowdsourcing based human-in-the-loop denoising framework which iteratively discovers UUs and corrects them by crowdsourcing to better extract relations. During each epoch of iterations, we choose one sentence bag and repeat two steps: Firstly, attention based Long Short-Term Memory network is applied as a selector to discover potential UUs. Secondly, these UUs are annotated by crowdsourcing with two answer collecting strategies and fed back into selector as positive samples. Until the accuracy of selector reaches a threshold, all annotated samples are added into relation classifier as cleaned train set and framework moves on to next epoch with new sentence bags. The experiments on the New York Times dataset and analysis of potential UUs demonstrate that our framework denoise the dataset and outperforms all the baselines on distant supervision relation extraction tasks.
0 Replies
Loading