DNA Language Model and Interpretable Graph Neural Network Identify Genes and Pathways Involved in Rare Diseases

Published: 06 Jul 2024, Last Modified: 28 Jul 2024Language and Molecules ACL 2024 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: DNA language model, graph neural network, gene prioritisation, pathways identification, rare diseases
TL;DR: We used a DNA language model to generate gene embeddings that reflect changes caused by pathogenic variants, then we used them to find causal genes and pathways.
Abstract: Identification of causal genes and pathways is a critical step for understanding the genetic underpinnings of rare diseases. We propose novel approaches to gene prioritization and pathway identification using DNA language model, graph neural networks, and genetic algorithm. Using HyenaDNA, a long-range genomic foundation model, we generated dynamic gene embeddings that reflect changes caused by deleterious variants. These gene embeddings were then utilized to identify candidate genes and pathways. We validated our method on a cohort of rare disease patients with partially known genetic diagnosis, demonstrating the re-identification of known causal genes and pathways and the detection of novel candidates. These findings have implications for the prevention and treatment of rare diseases by enabling targeted identification of new drug targets and therapeutic pathways.
Archival Option: The authors of this submission do *not* want it to appear in the archival proceedings.
Submission Number: 23
Loading