Demo: Combining Polygenic Risk and EHR Survival Models of Cardiac Patients in the Indian Context via Knowledge Distillation
Keywords: Polygenic Risk Score (PRS); Coronary Artery Disease (CAD); Knowledge Distillation; Survival Analysis; Electronic Health Records (EHR); Censoring-Aware Learning; IndiGenomes Dataset; Population Genomics; Allele Frequency Recalibration; Precision Cardiology; Indian Population
Abstract: Polygenic risk scores (PRS) improve coronary artery disease (CAD) prediction in many populations but suffer from reduced portability and calibration when applied to underrepresented ancestries. We present a practical pipeline that transfers population-adapted genomic risk into a deployable electronic-health-record (EHR) survival model through censoring-aware knowledge distillation. Using IndiGenomes allele-frequency data to recalibrate GWAS-derived PRS for the Indian population, we build a genomic teacher that outputs risk scores and survival curves; these soft targets are distilled into a parsimonious Cox/AFT student that uses only routine clinical features collected from a longitudinal Maharashtra EHR cohort (N$\simeq$5,000). The student is trained with a combined loss that blends the Cox partial likelihood and an IPCW-weighted distillation term so censored observations contribute robustly. In cross-validated experiments, our distilled student improves discrimination and net reclassification vs. an equivalent EHR model without distillation, while retaining clinical interpretability and deployability. Ablations demonstrate that recalibration using IndiGenomes allele frequencies substantially enhances the benefit of distillation, underscoring the importance of population-specific genomic adaptation. Our approach provides a scalable route to embed genomic risk structure into routine clinical tools in resource-constrained settings
Submission Number: 17
Loading