Annotating the Pandemic: Named Entity Recognition and Normalisation in COVID-19 Literature

Nico Colic; Lenz Furrer; Fabio Rinaldi

Annotating the Pandemic: Named Entity Recognition and Normalisation in COVID-19 Literature

Nico Colic, Lenz Furrer, Fabio Rinaldi

29 Jun 2020 (modified: 05 May 2023)Submitted to NLP-COVID-2020Readers: Everyone

Keywords: PubMed, NEN, NER, BioBERT

TL;DR: CRAFT-trained BioBERT and dictionary-based look-up used together to annotate COVID-19 papers

Abstract: The COVID-19 pandemic has been accompanied by such an explosive increase in media coverage and scientific publications that researchers find it difficult to keep up. We are presenting a publicly available pipeline to perform named entity recognition and normalisation in parallel to help find relevant publications and to aid in downstream NLP tasks such as text summarisation. In our approach, we are using a dictionary-based system for its high recall in conjunction with two models based on BioBERT for their accuracy. Their outputs are combined according to different strategies depending on the entity type. In addition, we are using a manually crafted dictionary to increase performance for new concepts related to COVID-19. We have previously evaluated our work on the CRAFT corpus, and make the output of our pipeline available on two visualisation platforms.

0 Replies

Loading