Keywords: denoising autoencoder, autoencoder, dae, genetics, genomics, representation learning, applied machine learning, neural network
TL;DR: We present GeneDAE, a sparse denoising autoencoder that extracts interpretable gene embeddings from population-level genotype data, offering a potentially useful tool for the discovery or analysis of gene-to-disease associations in genomics research
Abstract: A challenge in genomics research involves identifying functionally relevant genes associated with diseases. We present GeneDAE, a sparse denoising autoencoder that extracts gene representations from large-scale population-level genotype data, which can then be used to identify gene-to-disease associations. The GeneDAE encoder and decoder connections are modeled on a bipartite biological knowledge graph that connects individual variants (single nucleotide polymorphisms; SNPs) to their nearby genes, enabling each node in the hidden layer to be used as an interpretable, multi-purpose gene embedding derived using information only from variants in close proximity that are most likely to impact gene function. We use the UK Biobank dataset and focus on the major histone compatibility complex (MHC) region of the genome, which is critical to immune function and autoimmune disease pathophysiology. Using GeneDAE, we extracted 239 MHC gene embeddings and identified novel gene-to-disease associations.
4 Replies
Loading