Abstract: Pretraining large neural language models, such as BERT, has led to impressive gains on many natural language processing(NLP) tasks. However, most pretraining efforts focus on general domain corpora, such as newswire and Web. A prevailingassumption is that even domain-specific pretraining can benefit by starting from general-domain language models. In thispaper, we challenge this assumption by showing that for domains with abundant unlabeled text, such as biomedicine,pretraining language models from scratch results in substantial gains over continual pretraining of general-domain languagemodels. To facilitate this investigation, we compile a comprehensive biomedical NLP benchmark from publicly-availabledatasets. Our experiments show that domain-specific pretraining serves as a solid foundation for a wide range of biomedicalNLP tasks, leading to new state-of-the-art results across the board. Further, in conducting a thorough evaluation of modelingchoices, both for pretraining and task-specific fine-tuning, we discover that some common practices are unnecessary withBERT models, such as using complex tagging schemes in named entity recognition (NER). To help accelerate research inbiomedical NLP, we have released our state-of-the-art pretrained and task-specific models for the community, and created aleaderboard featuring our BLURB benchmark (short for Biomedical Language Understanding & Reasoning Benchmark) athttps://aka.ms/BLURB.
0 Replies
Loading