Phylogenetic Placement of Aligned Genomes and Metagenomes with Non-tree-like Evolutionary Histories

Published: 06 Sept 2023, Last Modified: 08 Apr 2024OpenReview Archive Direct UploadEveryoneCC BY-NC-SA 4.0
Abstract: Phylogenetic placement is the computational task that places a query taxon into a reference phylogeny using computational anal- ysis of biomolecular sequence data or other evolutionary charac- ters. A chief advantage of phylogenetic placement over one-shot phylogenetic reconstruction is greatly reduced computational re- quirements, and the former has been applied in many different topics in phylogenetics. One of the more recent applications has been enabled by rapid advances in biomolecular sequencing tech- nology: classification of genomes, metagenomes, and metagenome- assembled genomes (MAGs) in large-scale datasets produced by next-generation sequencing. A number of methods have been de- veloped for this purpose, and all share the common simplifying assumption that a phylogenetic tree suffices for modeling the evo- lutionary history of all genomes and/or metagenomes under study. Another parallel development in today’s post-genomic era is a greater understanding of the prevalence and importance of non- tree-like evolution in the Tree of Life – the evolutionary history of all life on Earth – which in fact may not be a tree at all. More general graph representations such as phylogenetic networks have there- fore been proposed, and a new generation of phylogenetic network reconstruction methods are under active development. But the sim- plifying assumption made by phylogenetic tree placement methods is fundamentally at odds with the non-tree-like evolutionary histo- ries of many microbes and other organisms. The consequences of this conflict are poorly understood. In this study, we conduct a comprehensive performance study to directly assess the impact of non-tree-like evolution on phylo- genetic tree placement of genomes and metagenomes. Our study includes in silico simulation experiments as well as empirical data analyses. We find that the topological accuracy of phylogenetic tree placement degrades quickly as genomic sequence evolution becomes increasingly non-tree-like. We then introduce a new sta- tistical method for phylogenetic network placement of genomes and metagenomes, which we refer to as NetPlacer version 0. Initial experiments with NetPlacer provide a proof-of-concept, but also point to the need for greater computational scalability. We conclude with thoughts on algorithmic techniques to enable fast and accurate phylogenetic network placement.
Loading