Knowledge Discovery in COVID-19 Research Literature

Alejandro Piad-Morffis; Suilan Estevez-Velarde; Ernesto Luis Estevanell-Valladares; Yoan Gutiérrez; Andrés Montoyo; Rafael Muñoz; Yudivián Almeida-Cruz

Knowledge Discovery in COVID-19 Research Literature

Alejandro Piad-Morffis, Suilan Estevez-Velarde, Ernesto Luis Estevanell-Valladares, Yoan Gutiérrez, Andrés Montoyo, Rafael Muñoz, Yudivián Almeida-Cruz

Published: 05 Jul 2022, Last Modified: 24 May 2023NLP-COVID19-EMNLP PosterReaders: Everyone

Keywords: semantic annotation models, entity recognition, relation extraction, corpus, conditional random fields, subject-action-target

TL;DR: A 500-sentences corpus annotated with general-purpose semantics to train baseline machine learning models for extracting entities and relations in COVID-related papers.

Abstract: This paper presents the preliminary results of an ongoing project that analyzes the growing body of scientific research published around the COVID-19 pandemic. In this research, a general-purpose semantic model is used to double annotate a batch of $500$ sentences that were manually selected by the researchers from the CORD-19 corpus. Afterwards, a baseline text-mining pipeline is designed and evaluated via a large batch of $100,959$ sentences. We present a qualitative analysis of the most interesting facts automatically extracted and highlight possible future lines of development. The preliminary results show that general-purpose semantic models are a useful tool for discovering fine-grained knowledge in large corpora of scientific documents.

3 Replies

Loading