Representation Learning based Target Discovery from UKBB MRI data

Published: 13 Oct 2024, Last Modified: 01 Dec 2024AIDrugX PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: representation learning, GWAS, masked autoencoders, UKBB MRI, video MAE, phenotypes
TL;DR: Leveraging self-supervised representation learning, this work proposes two methods to derive disease-relevant phenotypes from UKBB MRI images, surpassing limitations of manual annotations and improving biologically relevant genetic associations
Abstract: Medical imaging technologies such as MRI and CT scans offer valuable insights into a person's biological condition. Phenotypes derived from these images are essential for the discovery of novel drug targets. Traditional Genome-Wide Association Studies (GWAS) on imaging derived phenotypes (IDPs) require laborious manual feature annotation, extraction of disease-related phenotypes, and subsequent analysis of their associations with genetic variations. This approach has two main limitations: (1) manual voxel-level annotations are time consuming and subjective, particularly for intricate features; (2) these annotations are often limited to a handful of human-definable features, overlooking the wealth of information present in the scans. To address these limitations, we propose an alternative approach to derive phenotypes, which we term embedding-derived phenotypes (EDPs). Our approach consists of two steps. First, we train a self-supervised representation learning model to transform scans into latent embeddings, eliminating the need for manual annotations. Second, we convert these embeddings into disease-relevant phenotypes, preserving the information that may be lost in manually derived phenotypes. Although there are numerous self-supervised representation learning methods, it is not straightforward to transform the embeddings from these models into disease-relevant phenotypes. We present two simple methods that leverage binary labels like ICD-10 codes and demonstrate that the proposed methods identify more biologically meaningful genetic associations compared to using ICD-10 codes alone as binary traits or manually derived phenotypes.
Submission Number: 37
Loading