- Abstract: Recent breakthroughs of large-scale pretrained language models have shown the effectiveness of self-training for natural language processing (NLP). In addition to standard syntactic and semantic NLP tasks, pretrained models achieve strong improvements on tasks that involve real-world knowledge, suggesting that large-scale language modeling could be an implicit method to capture knowledge. In this work, we further investigate the extent to which pretrained models such as BERT capture knowledge using a zero-shot fact completion task. Moreover, we propose a simple yet effective weakly supervised training objective, which explicitly forces the model to incorporate knowledge about real-world entities. Models trained with our new objective yield significant improvements on the fact completion task. When applied to downstream tasks, our model also achieves consistent improvements over BERT on four entity-related question answering datasets (average 2.7 F1 improvements on WebQuestions, TriviaQA, SearchQA and Quasar-T) and a standard fine-grained entity typing dataset (i.e., 5.7 accuracy gains on FIGER), establishing several new state-of-the-art.