Statistically Consistent Estimation of Rooted and Unrooted Level-1 Phylogenetic Networks from SNP Data

Published: 01 Jan 2024, Last Modified: 29 Oct 2024RECOMB-CG 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: We address the problem of estimating a rooted phylogenetic network, as well as its unrooted version, from SNPs (i.e., single nucleotide polymorphisms), allowing for multiple crossover events. Thus, each SNP is assumed to have evolved under the infinite sites assumption down some tree inside the phylogenetic network. We prove that level-1 phylogenetic networks can be reconstructed uniquely from any set of SNPs that cover all bipartitions of the rooted trees contained in the network, even when the ancestral state is unknown. To the best of our knowledge, this is the first result to establish that the unrooted topology of a level-1 network is uniquely recoverable from SNPs without known ancestral states. We present a stochastic model for DNA evolution, and we prove that Gusfield’s algorithms in JCSS 2005 (one for the case where the ancestral state is known, and the other when it is not known) can be used in polynomial time, statistically consistent pipelines to estimate level-1 phylogenetic networks when all cycles are of length at least five, under the stochastic model we propose, provided that we have access to an oracle for indicating which sites in the DNA alignment are SNPs.
Loading