Keywords: Negative Samples, Cohort Discovery, Healthcare Analytics
TL;DR: In this paper, we bridge the research gap caused by the asymmetry between positive and negative samples in healthcare analytics by exploring negative samples for cohort discovery.
Abstract: Healthcare analytics, particularly binary diagnosis or prognosis problems, present unique challenges due to the inherent asymmetry between positive and negative samples. While positive samples, representing patients who develop a disease, are defined through rigorous medical criteria, negative samples are defined in an open-ended manner, resulting in a vast potential set. Despite this fundamental asymmetry, previous research has underexplored the role of negative samples, possibly due to the enormous challenge of investigating an infinitely large negative sample space. To bridge this gap, we propose an approach to facilitate cohort discovery within negative samples, which could yield valuable insights into the studied disease, as well as its comorbidity and complications. We measure each sample’s contribution using data Shapley values and construct the Negative Sample Shapley Field to model the distribution of all negative samples. Then we transform this field via manifold learning, preserving the data structure information while imposing an isotropy constraint in data Shapley values. Within this transformed space, we identify cohorts of medical interest through density-based clustering. We empirically evaluate the effectiveness of our approach on our hospital’s electronic medical records. The medical insights revealed in the discovered cohorts are validated by clinicians, which affirms the medical value of our proposal in unveiling meaningful insights consistent with existing domain knowledge, thereby bolstering medical research and well-informed clinical decision-making.
Supplementary Material: zip
Submission Number: 9772
Loading