Towards Fully Self-Supervised Learning of Knowledge from Unstructured TextDownload PDF

Anonymous

16 Nov 2021 (modified: 05 May 2023)ACL ARR 2021 November Blind SubmissionReaders: Everyone
Abstract: Pre-trained language models (PLMs) like BERT have made significant progress in various downstream NLP tasks. However, recent works find that PLMs are short in acquiring knowledge from the unstructured text, by asking models to do cloze-style tests. To understand the internal behavior of PLMs in retrieving knowledge, we firstly define knowledge-baring tokens and knowledge-free tokens for unstructured text and manually label on some samples. Then, we find that PLMs are more likely to predict incorrectly on K-B tokens and attend less attention to those tokens inside the self-attention module. Based on these observations, we develop two solutions to help the model learn more knowledge from the unstructured text in a fully self-supervised manner.Experiments on knowledge probing tasks show the effectiveness of the proposed methods. To our knowledge, we are the first to explore fully self-supervised learning of knowledge in continue pre-training.
0 Replies

Loading