Abstract: Highlights•Leveraging data augmentation of a large amount of unlabeled data.•Promoting consistency in inter-token dependencies between the augmented pairs.•Being model agnostic and can be used in any existing supervised NER model.•Theoretically analyze the required number of labeled data for a certain error rate.
Loading