Keywords: NER, unlabeled entity problem, PU-learning, negative sampling, self-supervision
Abstract: The NER task is largely developed based on well-annotated data.
However, in many scenarios, the entities may not be fully annotated, leading to performance degradation.
A common approach for this problem is to distinguish unlabeled entities from negative instances using labeled data.
However, the vast differences between entities make such empirical approaches difficult to realize.
Our solution is to treat unlabeled entities based on both empirical inference and random sampling.
To this end, we propose a simple yet effective two-step method that consists of a novel Positive-Unlabeled (PU-learning) algorithm and negative sampling, in which PU-learning distinguishes part of the unlabeled entities from negative instances based on the confidence threshold.
In general, the proposed method can mitigate the impact of unlabeled entities at the outset and can be easily applied to any character-level NER model.
We verify the effectiveness of our method on several NER models and datasets, showing a strong ability to deal with unlabeled entities.
Finally, in real-world situations, we establish new state-of-the-art results on many benchmark NER datasets.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Submission Guidelines: Yes
Please Choose The Closest Area That Your Submission Falls Into: Deep Learning and representational learning
6 Replies
Loading