Keywords: Soft Combinatorial Optimization, End-to-end Differentiable Models, Phylogenetic tree, Evolutionary Biology, Maximum Parsimony, Tree sampling
TL;DR: We introduce a differentiable approach to phylogenetic tree construction, optimizing tree and ancestral sequences in its original representation itself, thus requiring no prior training data.
Abstract: Inferring the most probable evolutionary tree given leaf nodes is an important problem in computational biology that reveals the evolutionary relationships between species. Due to the exponential growth of possible tree topologies, finding the best tree in polynomial time becomes computationally infeasible. In this work, we propose a novel differentiable approach as an alternative to traditional heuristic-based combinatorial tree search methods in phylogeny. The optimization objective of interest in this work is to find the most parsimonious tree (i.e., to minimize the total number of evolutionary changes in the tree). We empirically evaluate our method using randomly generated trees of up to 128 leaves, with each node represented by a 256-length protein sequence. Our method exhibits promising convergence ($<1$% error for trees up to 32 leaves, $<8$% error up to 128 leaves, given only leaf node information), illustrating its potential in much broader phylogenetic inference problems and possible integration with end-to-end differentiable models. The code to reproduce the experiments in this paper can be found at https://github.ramith.io/diff-evol-tree-search.
Submission Number: 46
Loading