Patient Risk Prediction Model via Top-k Stability Selection

Jiayu Zhou, Jimeng Sun, Yashu Liu, jianying hu, Jieping Ye

19 Jan 2023OpenReview Archive Direct UploadReaders: Everyone

Abstract: The patient risk prediction model aims at assessing therisk of a patient in developing a target disease basedon his/her health pro le. As electronic health records(EHRs) become more prevalent, a large number of fea-tures can be constructed in order to characterize pa-tient pro les. This wealth of data provides unprece-dented opportunities for data mining researchers to ad-dress important biomedical questions. Practical datamining challenges include: How to correctly select andrank those features based on their prediction power?What predictive model performs the best in predictinga target disease using those features?In this paper, we propose top-kstability selection,which generalizes a powerful sparse learning method forfeature selection by overcoming its limitation on pa-rameter selection. In particular, our proposed top-kstability selection includes the original stability selec-tion method as a special case givenk= 1. Moreover,we show that the top-kstability selection is more ro-bust by utilizing more information from selection prob-abilities than the original stability selection, and pro-vides stronger theoretical properties. In a large set ofreal clinical prediction datasets, the top-kstability se-lection methods outperform many existing feature se-lection methods including the original stability selec-tion. We also compare three competitive classi cationmethods (SVM, logistic regression and random forest)to demonstrate the e ectiveness of selected features byour proposed method in the context of clinical predic-tion applications. Finally, through several clinical ap-plications on predicting heart failure related symptoms,we show that top-kstability selection can successfullyidentify important features that are clinically meaningful.

0 Replies