Abstract: In this paper, we demonstrate a novel pipeline for identifying and extracting patient subpopulations from unstructured physician’s notes. We validate the method by extracting patients with psychiatric issues from a general patient population. This method first uses a clinical metathesaurus to select terms of interest from reports, then vectorizes the terms using a transformer model. These vectors’ dimensions are reduced using Uniform Manifold Approximation and Projection (UMAP), and the results grouped by optimal cluster selection methods. We demonstrate this technique on a freely-available collection of deidentified patient notes (MIMIC IV), extracting and clustering “mental or behavioral dysfunctions”. Our results show that it is possible to select user-defined groups of patients from unstructured text with minimal model oversight to group patients with similar profiles. In our study cohort, the models automatically segmented the patients into two groups: patients with more physical symptoms (alcohol/drug abuse, dysarthria, tongue-biting, eating disorders) and patients with mental/emotional symptoms. By detecting the underlying similarities in patient profiles, we believe this method can be utilized for symptom prediction tasks, as well as curating treatment plans based on their cluster profiles. Such a system can assist in clinical decision-making without the need for individually-created NLP models.
Loading