Early Detection of Depression Using Machine Learning and Social Well-Being Survey Data

Alex X. Wang, Binh P. Nguyen, Tom Elliott, James F. Mbinta, Andrew Sporle, Colin R. Simpson

Published: 01 Jan 2024, Last Modified: 22 Jun 2025ICCAE 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: This study aims to develop an explainable machine learning (ML) model for predicting depression and risk factor explanation of this outcome using survey data. LightGBM was developed on publicly available survey data (n=1,382) for depression prediction, while statistical models were deployed to explain the prediction results. Model performance was assessed using the areas under the receiver operating characteristic (ROC-AUC) and precision-recall (PR-AUC) curves for over 30 trials of 5-fold nested cross-validation. An explainable ML model with top important features and SHapley Additive Explanations (SHAP) analyses were performed. The proposed LightGBM model provided an average accuracy of 82.74%, a sensitivity of 74.74%, and a specificity of 85.29% to detect depression. The average ROC-AUC of 89.74% and average PR-AUC of 73.52% confirmed that the LightGBM model could effectively detect people with depression. Small variations in ROC-AUC and PR-AUC among the 30 trials indicated that our model was stable and robust. Our final model revealed a strong correlation between depression and COVID-19, mainly through the behavioural and emotional changes caused by the lockdown. We observed that sleep quality, personal affect, and demographic characteristics were the key predictors of depression. The model accurately predicted the incidence of depression and explained the risk factors leading to depression. The accuracy, simplicity, and interpretability of the final model suggested that it has potential application in routine clinical practice to assist depression self-assessment and diagnosis explanation. These results may help guide future explainable ML model development in other therapeutic areas.