- Keywords: scattertext, scispaCy, Text Visualization, NER, NLP, CORD-19
- TL;DR: Interactive Text Visualization to explore diseases and chemical entities in the COVID-19 Open Research Dataset(CORD-19)
- Abstract: This work explores the use of Natural Language Processing based algorithms for Large Text Mining and Interactive Visualization for the COVID-19 Open Research Dataset (CORD-19) Dataset. We developed a series of easy to use online interactive text visualization based on different percentages of mined text data of diseases and chemical entities from the CORD-19 Dataset. This is to enable the study of patterns based on the frequency of entities in a very large dataset of about 2.6 million disease and chemical entities extracted from 31,376 papers. This will be useful to medical professionals, especially those who are not familiar with data mining techniques to interact with diseases, symptoms, drugs and chemicals texts entities to study patterns, relationships and trends to derive insights about the COVID-19 disease from publications about the disease and similar diseases. These extracted entities will also be made publicly available so that more work can be done with the dataset.