An Empirical Study of ML-based Phenotyping and Denoising for Improved Genomic DiscoveryDownload PDF

09 Oct 2022 (modified: 05 May 2023)LMRL 2022 PaperReaders: Everyone
Keywords: genomics, genetics, machine-learning-based phenotyping, denoising
TL;DR: We empirically study the ability of ML-based phenotyping methods to correct for noisy labels, quantifying the impact of corrupted labels on machine learning model performance and downstream genomic discovery.
Abstract: Genome-wide association studies (GWAS) are used to identify genetic variants significantly correlated with a target disease or phenotype as a first step to detect potentially causal genes. The availability of high-dimensional biomedical data in population-scale biobanks has enabled novel machine-learning-based phenotyping approaches in which machine learning (ML) algorithms rapidly and accurately phenotype large cohorts with both genomic and clinical data, increasing the statistical power to detect variants associated with a given phenotype. While recent work has demonstrated that these methods can be extended to diseases for which only low quality medical-record-based labels are available, it is not possible to quantify changes in statistical power since the underlying ground-truth liability scores for the complex, polygenic diseases represented by these medical-record-based phenotypes is unknown. In this work, we aim to empirically study the robustness of ML-based phenotyping procedures to label noise by applying varying levels of random noise to vertical cup-to-disc ratio (VCDR), a quantitative feature of the optic nerve that is predictable from color fundus imagery and strongly influences glaucoma referral risk. We show that the ML-based phenotyping procedure recovers the underlying liability score across noise levels, significantly improving genetic discovery and PRS predictive power relative to noisy equivalents. Furthermore, initial denoising experiments show promising preliminary results, suggesting that improving such methods will yield additional gains.
0 Replies