GeneDAE: A Sparse Denoising Autoencoder for Deriving Interpretable Gene Embeddings

Monica Isgut; Neha Jain; Andrew Hornback; Karan Samel; May Dongmei Wang

GeneDAE: A Sparse Denoising Autoencoder for Deriving Interpretable Gene Embeddings

Monica Isgut, Neha Jain, Andrew Hornback, Karan Samel, May Dongmei Wang

01 Mar 2023 (modified: 31 May 2023)Submitted to Tiny Papers @ ICLR 2023Readers: Everyone

Keywords: denoising autoencoder, autoencoder, dae, genetics, genomics, representation learning, applied machine learning, neural network

TL;DR: We present GeneDAE, a sparse denoising autoencoder that extracts interpretable gene embeddings from population-level genotype data, offering a potentially useful tool for the discovery or analysis of gene-to-disease associations in genomics research

Abstract: A challenge in genomics research involves identifying functionally relevant genes associated with diseases. We present GeneDAE, a sparse denoising autoencoder that extracts gene representations from large-scale population-level genotype data, which can then be used to identify gene-to-disease associations. The GeneDAE encoder and decoder connections are modeled on a bipartite biological knowledge graph that connects individual variants (single nucleotide polymorphisms; SNPs) to their nearby genes, enabling each node in the hidden layer to be used as an interpretable, multi-purpose gene embedding derived using information only from variants in close proximity that are most likely to impact gene function. We use the UK Biobank dataset and focus on the major histone compatibility complex (MHC) region of the genome, which is critical to immune function and autoimmune disease pathophysiology. Using GeneDAE, we extracted 239 MHC gene embeddings and identified novel gene-to-disease associations.

4 Replies

Loading