Keywords: graph neural networks, genome assembly, de novo, graph algorithms
TL;DR: We train a model to perform graph simplification algorithms commonly used in de novo genome assembly
Abstract: De novo genome assembly focuses on finding connections between a vast amount
of short sequences in order to reconstruct the original genome. The central problem
of genome assembly could be descried as finding a Hamiltonian path through a large
directed graph with a constraint that an unknown number of nodes and edges should
be avoided. However, due to local structures in the graph and biological features,
the problem can be reduced to graph simplification, which includes removal of
redundant information. Motivated by recent advancements in graph representation
learning and neural execution of algorithms, in this work we train the MPNN model
with max-aggregator to execute several algorithms for graph simplification. We
show that the algorithms were learned successfully and can be scaled to graphs of
sizes up to 20 times larger than the ones used in training. We also test on graphs
obtained from real-world genomic data—that of a lambda phage and E. coli.
Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/a-step-towards-neural-genome-assembly/code)
1 Reply
Loading