MDTREE: A Masked Dynamic Autoregressive Model for Phylogenetic Inference

23 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Phylogenetic Inference, Genome Language Model, Transformer, Graph Structure Generation, DNA, Large Language Models
Abstract: Phylogenetic tree inference, crucial for understanding species evolution, presents challenges in jointly optimizing continuous branch lengths and discrete tree topologies. Traditional Markov Chain Monte Carlo methods, though widely adopted, suffer from slow convergence and high computational costs. Deep learning methods have introduced more scalable solutions but still face limitations. Bayesian generative models struggle with computational complexity, autoregressive models are constrained by predefined species orders, and generative flow networks still fail to fully leverage evolutionary signals from genomic sequences. In this paper, we introduce MDTree, a novel framework that redefines phylogenetic tree generation from the perspective of dynamically learning node orders based on biological priors embedded in genomic sequences. By leveraging a Diffusion Ordering Network to learn evolutionarily meaningful node orders, MDTree autoregressively positions nodes to construct biologically coherent trees. To further push its limits, we propose a dynamic masking mechanism that accelerates tree generation through parallel node processing. Extensive experiments show that MDTree outperforms existing methods on standard phylogenetic benchmarks, offering biologically interpretable and computationally efficient solutions for tree generation.
Primary Area: applications to physical sciences (physics, chemistry, biology, etc.)
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 2808
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview