Abstract: Academic Knowledge Services have substantially facilitated the development of human science and technology, providing a plenitude of useful research tools. However, many applications highly depend on ad-hoc models and expensive human labeling to understand professional contents, hindering deployments in real world. To create a unified backbone language model for various knowledge-intensive academic knowledge mining challenges, based on the world's largest public academic graph Open Academic Graph (OAG), we pre-train an academic language model, namely OAG-BERT, to integrate massive heterogeneous entity knowledge beyond scientific corpora. We develop novel pre-training strategies along with zero-shot inference techniques. OAG-BERT's superior performance on 9 knowledge-intensive academic tasks (including 2 demo applications) demonstrates its qualification to serve as a foundation for academic knowledge services. Its zero-shot capability also offers great potential to mitigate the need of costly annotations. OAG-BERT has been deployed to multiple real-world applications, such as reviewer recommendations for NSFC (National Nature Science Foundation of China) and paper tagging in the AMiner system. All codes and pre-trained models are available via the CogDL.
Loading