An Oversampling-Enhanced Multi-Class Imbalanced Classification Framework for Patient Health Status Prediction Using Patient-Reported Outcomes

Published: 2025, Last Modified: 03 Feb 2026IEEE Access 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Patient-reported outcomes (PROs), directly captured from cancer patients undergoing radiation therapy, play a crucial role in guiding clinicians’ counseling on treatment-related toxicities. Accurate prediction and assessment of symptoms and health status linked to PROs are essential for improving clinical decision-making and planning post-treatment support as patients transition into survivorship. However, raw PRO data collected in clinical settings presents two inherent challenges, including data sparsity (due to incomplete item responses) and imbalanced toxicity distributions. These factors complicate predictive modeling. This study investigates advanced machine learning techniques to address these challenges by predicting outcomes such as pain and sleep disturbances using PRO datasets from a cancer therapy center. We implement advanced classifiers (i.e., RF, XGBoost, GB, SVM, MLP-Bagging, and LR) for multi-class imbalance tasks across three cancers. To address the minority cases, we apply oversampling while preserving class ratios. Experimental results demonstrate RF and XGBoost’s strong generalization, highlighting their utility in categorizing post-therapy severity levels for clinical decision support.
Loading