Exploiting Negative Samples: A Catalyst for Cohort Discovery in Healthcare Analytics

23 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: applications to physical sciences (physics, chemistry, biology, etc.)
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: Negative Samples, Cohort Discovery, Healthcare Analytics
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
TL;DR: In this paper, we bridge the research gap caused by the asymmetry between positive and negative samples in healthcare analytics by exploring negative samples for cohort discovery.
Abstract: In healthcare analytics, particularly when dealing with binary diagnosis or prognosis tasks, unique challenges arise from the inherent asymmetry between positive and negative samples. Positive samples, denoting patients who develop a disease, are defined based on stringent medical criteria. In contrast, negative samples are defined in an open-ended manner, leading to a vast potential set. Despite this fundamental asymmetry, the role of negative samples remains underexplored in prior research, possibly due to the enormous challenge of investigating an infinitely large negative sample space. To bridge this gap, we propose an innovative approach to facilitate cohort discovery within negative samples, leveraging a Shapley-based exploration of interrelationships between these samples, which holds promise for uncovering valuable insights concerning the studied disease, and related comorbidity and complications. We quantify each sample’s contribution using data Shapley values, subsequently constructing the Negative Sample Shapley Field to model the distribution of all negative samples. Next, we transform this field through manifold learning, preserving the essential data structure information while imposing an isotropy constraint in data Shapley values. Within this transformed space, we pinpoint cohorts of medical interest via density-based clustering. We empirically evaluate the effectiveness of our approach on our hospital’s electronic medical records. The medical insights derived from the discovered cohorts are validated by clinicians, which affirms the medical value of our proposal in unveiling meaningful insights aligning with existing domain knowledge, thereby bolstering medical research and well-informed clinical decision-making.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
Supplementary Material: zip
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 7721
Loading