An Empirical Investigation of Commonsense Self-Supervision with Knowledge GraphsDownload PDF

Anonymous

16 Nov 2021 (modified: 17 May 2023)ACL ARR 2021 November Blind SubmissionReaders: Everyone
Abstract: Large knowledge graphs have been shown to benefit zero-shot evaluation of downstream tasks, through continual pre-training of language models. Yet, little is known about how to optimally learn from this knowledge, and what is the impact of the resulting models on different task partitions. This paper studies the effect of model architectures, loss functions, and knowledge subsets on the generalization of zero-shot models across task partitions. Our experiments show that data size, model size, model architecture, and loss function all play an important role in the accuracy and generalizability of the models. Most of the improvement occurs on questions with short answers and dissimilar answer candidates, which corresponds to the characteristics of the data used for pre-training. These findings inform future work that uses self-supervision with large knowledge graphs in order to create generalizable commonsense reasoning agents.
0 Replies

Loading