PhyloTextDiff: Text-Based Discrete Diffusion for Generative Phylogenetic Inference

ICLR 2026 Conference Submission21522 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Phylogenetic Inference, Discrete diffusion models, Generative Modeling, Bayesian Inference
Abstract: Phylogenetic inference aims to reconstruct the evolutionary relationships among species from DNA sequence data. Despite its long history and broad applications, accurately modeling phylogenetic tree distributions remains challenging due to the combinatorial explosion of possible topologies. In this work, we introduce PhyloTextDiff, the first textual-based and discrete diffusion model for phylogenetic inference. PhyloTextDiff is trained once using multiple DNA matrices allowing it to learn common patterns both in the DNA sequences and the textual tree representations. It operates non-autoregressively, enabling fast and scalable generation that is minimally impacted by the number of taxa. Leveraging the diffusion process, PhyloTextDiff is particularly well-suited for exploring the vast and multimodal landscape of phylogenetic tree spaces. Experiments on benchmark datasets demonstrate that PhyloTextDiff produces high-quality trees and enables efficient exploration of large phylogenetic spaces, opening the door to large-scale phylogenetic discovery.
Primary Area: applications to physical sciences (physics, chemistry, biology, etc.)
Submission Number: 21522
Loading