Hundreds of cardiac MRI traits derived using 3D diffusion autoencoders share a common genetic architecture

Sara Ometto, Soumick Chatterjee, Andrea Mario Vergani, Arianna Landini, Sodbo Sharapov, Edoardo Giacopuzzi, Alessia Visconti, Emanuele Bianchi, Federica Santonastaso, Emanuel M. Soda, Francesco Cisternino, Carlo Andrea Pivato, Francesca Ieva, Emanuele Di Angelantonio, Nicola Pirastu, Craig A. Glastonbury

Published: 05 Nov 2024, Last Modified: 08 Jan 2026CrossrefEveryoneRevisionsCC BY-SA 4.0
Abstract: h3>Abstract</h3> <p>Biobank-scale imaging provides an unprecedented opportunity to characterise thousands of organ phenotypes, how they vary in populations and how they relate to disease outcomes. However, deriving specific phenotypes from imaging data, such as Magnetic Resonance Imaging (MRI), requires time-consuming expert annotation, limiting scalability, and does not exploit how information-dense such image acquisitions are. In this study, we developed a 3D diffusion autoencoder to derive latent phenotypes from temporally resolved cardiac MRI data of 71,021 UK Biobank participants. These phenotypes were reproducible, heritable (<i>h</i><sup>2</sup> = [4 - 18%]), and significantly associated with cardiometabolic traits and outcomes, including atrial fibrillation (<i>P</i> = 8.5 × 10<sup>−29</sup>) and myocardial infarction (<i>P</i> = 3.7 × 10<sup>−12</sup>). By using latent space manipulation techniques, we were able to learn, directly interpret and visualise what specific latent phenotypes are capturing in a given MRI. To establish the genetic basis of such traits, we performed a genome-wide association study, identifying 89 significant common variants (<i>P &lt;</i> 2.3 × 10<sup>−9</sup>) across 42 loci, including seven novel loci. Extensive multi-trait colocalisation analyses (PP.H<sub>4</sub><i>&gt;</i> 0.8) linked variants across phenotypic scales, from intermediate cardiac traits to cardiac disease endpoints. For example, rs142556838 that falls in <i>CCDC141</i> colocalises with a latent imaging phenotype and a diastolic blood pressure locus. Using single-cell RNA-sequencing data we map <i>CCDC141</i> expression specifically to a population of ventricular cardiomyocytes. Finally, Polygenic Risk Scores (PRS) derived from latent phenotypes demonstrated predictive power for a range of cardiometabolic diseases and enabled us to successfully stratify the individuals into different risk groups. In conclusion, this study showcases the use of diffusion autoencoding methods as powerful tools for unsupervised phenotyping, genetic discovery and disease risk prediction using cardiac MRI data.</p>
Loading