COVID-19 press conference search engine using BERTDownload PDF

07 Jun 2020 (modified: 05 May 2023)Submitted to NLP-COVID-2020Readers: Everyone
Keywords: bert, search engine, sbert, covid-19
TL;DR: collecting transcripts of press conferences of covid-19, selecting questions and answers, and building a search engine using bert
Abstract: There have been multiple press conferences concerning COVID-19, where governments present their efforts in fighting the pandemic. These briefings provide reporters with a platform for their questions to be answered. This work studies multiple press conferences from different governments and agencies, ranging from WHO to the Whitehouse to different state governors to even different governments. This work collects the transcripts of these press conferences, then using a custom heuristic, selects short exchanges between different speakers, hence selecting exchanges made by the reporters. Then using a custom trained sentence-classifier, selects the questions raised by the reporters through these exchanges. This creates a new dataset, which contains the questions asked by reporters and how they were answered by officials. This dataset can prove useful in a number of applications, in this work we present one of these uses, which is building a search engine. This search engine is built on these questions by fine-tuning the state-of-the-art BERT language model on the collected COVID-19 press conference transcript dataset. This search engine can prove helpful in answering questions raised by the public and knowing how they were answered by officials, it can also help reporters and researchers in finding how a specific question was answered by the different governments. Our goal by this work is to help organize the press questions concerning COVID-19 to help build an insight on the different efforts being taken to combat the pandemic.
5 Replies

Loading