How Can the [MASK] Know? The Sources and Limitations of Knowledge in BERTDownload PDFOpen Website

Published: 01 Jan 2021, Last Modified: 28 Jan 2024IJCNN 2021Readers: Everyone
Abstract: We explore the idea of using the pre-trained BERT as a source of factual knowledge, analyze which components of the model are responsible for its ability to answer questions requiring factual knowledge, and study the transferability of the knowledge to downstream tasks. Our experiments show that the Language Modeling Head is indispensable for predicting facts, implying that transferability of any knowledge captured in the model is limited. While the dominant approach to researching how knowledge is stored in language models focuses on tailoring question formulation to optimize the retrieval quality, we find question patterns easily understood by humans that confuse BERT to the point that the answer does not make sense. The nature of the found patterns implies that the stored knowledge is fragile and based on token co-occurrence in the training set used during pre-training, rather than generalization or inference. Moreover, using a novel, hand-crafted dataset we show that BERT is vulnerable to common misconceptions, which could have fatal effects on downstream applications. Overall, we conclude that BERT offers low and unreliable performance out of the box. Jupyter notebooks with experiments are available on GitHub. <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1</sup> <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1</sup> Project URL: https://github.com/tenpercent/knowledge-and-confusion
0 Replies

Loading