NLP-based Feature Extraction for the Detection of COVID-19 Misinformation Videos on YouTube

Juan Carlos Medina Serrano; Orestis Papakyriakopoulos; Simon Hegelich

NLP-based Feature Extraction for the Detection of COVID-19 Misinformation Videos on YouTube

Juan Carlos Medina Serrano, Orestis Papakyriakopoulos, Simon Hegelich

Published: 06 Jul 2020, Last Modified: 05 May 2023NLP-COVID-2020Readers: Everyone

Keywords: YouTube, misinformation, conspiracy, detection, user comments

TL;DR: We classify conspiratorial user comments and then use the percentage of them as a feature to detect misinformation videos on YouTube.

Abstract: We present a simple NLP methodology for detecting COVID-19 misinformation videos on YouTube by leveraging user comments. We use transfer-learning pre-trained models to generate a multi-label classifier that can categorize conspiratorial content. We use the percentage of misinformation comments on each video as a new feature for video classification. We show that the inclusion of this feature in simple models yields an accuracy of up to 82.2%. Furthermore, we verify the significance of the feature by performing a Bayesian analysis. Finally, we show that adding the first hundred comments as tf-idf features increases the video classifier accuracy by up to 89.4%.

6 Replies

Loading