Editorial: Pattern recognition for healthcare analytics

Inci M. Baytas; Yifan Peng; Arzucan Özgür

Editorial: Pattern recognition for healthcare analytics

Inci M. Baytas, Yifan Peng, Arzucan Özgür

Published: 01 Jan 2023, Last Modified: 21 Feb 2025Frontiers Digit. Health 2023EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Electronic Health Records (EHRs) systems are widely employed at hospitals to register patient information in various data types, including text. Since patient data is vulnerable, the privacy of the information in EHRs should be protected. Therefore, de-identification techniques can be applied before circulating EHR data to ensure patient privacy. Paul et al. explored this problem in their study titled "Investigation of the Utility of Features in a Clinical De-identification Model: A Demonstration Using EHR Pathology Reports for Advanced NSCLC Patients". The authors utilized open-source Natural Language Processing (NLP) toolkits to explore various text representation techniques to extract features from EHR's text data. The best features based on the experiments and their combinations were used to train a Named Entity Recognition (NER) model, which enables identifying clinical entities of interest in the text. The authors suggested n-gram, prefix-suffix, word embeddings, and word shape as the best-performing features for improving recall in the NER task.EHRs are priceless resources for studying contributing factors of various conditions. For instance, Mahabadi et al. investigated the impact of the physical characteristics of the urban environment on Severe Mental Illnesses (SMI) using EHR data in their study titled "Evaluating Physical Urban Features in Several Mental Illnesses using Electronic Health Record Data". The authors considered 28 urban and 6 clinical features from a cohort of 30,210 patients. The scale of the patient cohort makes it impossible for a clinician to draw inferences manually from the data of more than 30K patients. However, the authors addressed this challenge by benefiting the LASSO regression's interpretability property to obtain the most significant features. The authors also suggested employing the Self-Organising Map technique to interpret the results visually. Predicting disease progression is vital for neurodegenerative diseases, such as Alzheimer's disease (AD). Hason and Krishnan focused on the early diagnosis and progression of AD based on speech signals in their study titled "Spontaneous Speech Feature Analysis for Alzheimer's Disease Screening using a Random Forest Classifier". Depending on the healthcare task, specific requirements for feature extraction emerge. This study aimed to discriminate AD patients from cognitively normal patients using acoustic input. The authors suggested exploring the nonstationarity and non-linearity properties of the audio features to determine the most significant audio features to run a random forest classifier to detect AD.It has been evident in the studies mentioned earlier that representing patients using salient and informative features is crucial to train successful machine learning models for healthcare tasks. Hernandez et al. studied In summary, this special issue presents various data-driven approaches to solving different healthcare tasks with diverse types of patient data. All the studies emphasize that determining significant features per the specific task at hand improves the performance of the data-driven models.İnci M. Baytaş Yifan Peng Arzucan Özgür

Loading