Abstract: This paper describes the system proposed by the BIT-Event team for NLPCC 2021 shared task on Subevent Identification. The task includes two settings, and these settings face less reliable labeled data and the dilemma about selecting the most valid data to annotate, respectively. Without the luxury of training data, we propose a hybrid system based on semi-supervised algorithms to enhance the performance by effectively learning from a large amount of unlabeled corpus. In this hybrid model, we first fine-tune the pre-trained model to adapt it to the training data scenario. Besides, Adversarial Training and Virtual Adversarial Training are combined to enhance the effect of a single model with unlabeled in-domain data. The additional information is further captured via retraining using pseudo-labels. On the other hand, we apply Active Learning as an iterative process that starts from a small number of labeled seeding instances. The experimental results suggest that the semi-supervised methods fit the low-resource subevent identification problem well. Our best results were obtained by an ensemble of these methods. According to the official results, our approach proved the best for all the settings in this task.
0 Replies
Loading