Comparative Analysis of Machine Learning Models for Obstructive Sleep Apnea and Hypopnea Detection Using Pulse Oximetry and Heart Rate Variability

Published: 19 Aug 2025, Last Modified: 12 Oct 2025BHI 2025EveryoneRevisionsBibTeXCC BY 4.0
Confirmation: I have read and agree with the IEEE BHI 2025 conference submission's policy on behalf of myself and my co-authors.
Keywords: Obstructive Sleep Apnea, Machine Learning, Pulse Oximetry, Heart Rate Variability, Polysomnography
Abstract: Obstructive sleep apnea and hypopnea syndrome (OSAHS) is a significantly underdiagnosed condition that can lead to dangerous and sometimes life-threatening complications such as heart failure, stroke, and sudden cardiac death. Traditional diagnostic methods for OSAHS, such as polysomnography, are resource-intensive and not readily accessible for large-scale screening. In this study, we compared the efficacy of machine learning (ML) algorithms using non-invasive physiological data- pulse oximetry and heart rate variability, which can be recorded using wearable sensors, to detect OSAHS in a large dataset consisting of 6399 recordings (53% women and mean age 62±13 years). The ML algorithms were trained and tuned using nested cross-validation on a subset of the dataset (training set, 80% of the dataset) and separately validated on the independent test set (20% of the dataset) to showcase the generalizability of our model performance. Furthermore, we investigated the performance of ML algorithms with respect to the sampling frequency, available data length, and presence of noise in physiological signals to understand the impact of real-world constraints on OSAHS detection. We also explored the model explainability with SHapley Additive exPlanations (SHAP) and an ablation study to enhance the clinical interpretation of the results. Our comparative analysis of ML algorithms (Random Forest, Support Vector Machine, eXtreme Gradient Boosting, Multi-Layer Perceptron, etc.) demonstrated the best performance for eXtreme Gradient Boosting algorithms with an F1-score of 0.896±0.012 and 0.897 on the cross-validated training set and independently validated test set, respectively. The algorithm’s performance deteriorated with reduced data availability in the independent test set, with an F1-score of 0.897, 0.89, 0.887, 0.885, and 0.879 using physiological data with eight (full-night), four, two, one-hour, and 30-minute recording lengths, respectively. Algorithm performance was highest in models using pulse oximetry data with a 0.5 Hz sampling rate compared to 1 and 0.25 Hz sampling rates. The findings highlight the potential of various ML-driven analyses of unobtrusive physiological signals for scalable OSAHS screening and consideration of real-world constraints on the ML algorithm performance.
Track: 2. Bioinformatics
Registration Id: X4NGWBX7B6X
Submission Number: 358
Loading