Keywords: COVID-19, prediction, machine learning, XGBoost, survey data, US COVID-19 Trends and Impact Survey, symptoms, testing reason, demographics, behaviors, vaccination status
TL;DR: We used the US COVID-19 Trends and Impact Survey to predict COVID-19 status using self-reported symptoms, demographics, and behaviors. The best model (XGBoost, F1 = 94%) demonstrated the utility of surveys when diagnostic tests are not available.
Abstract: With the varying availability of RT-PCR testing for COVID-19 across time and location, there is a need for alternative methods of predicting COVID-19 case status. In this study, multiple machine learning (ML) models were trained and assessed for their ability to accurately predict the COVID-19 case status using US COVID-19 Trends and Impact Survey (CTIS) data. The CTIS includes information on testing, symptoms, demographics, behaviors, and vaccination status. The best performing model was XGBoost, which achieved an F1 score of ~94% in predicting whether an individual was COVID-19 positive or negative. This is a notable improvement on existing models for predicting COVID-19 case status and demonstrates the potential for ML methods to provide policy-relevant estimates.