Hybrid Phylogenetic Model Selection using Deep Learning and High-Performance Computing

Hashara Kumarasinghe; Tamara Drucks; Thomas Wong; Arndt von Haeseler; Minh Bui

Hybrid Phylogenetic Model Selection using Deep Learning and High-Performance Computing

Hashara Kumarasinghe, Tamara Drucks, Thomas Wong, Arndt von Haeseler, Minh Bui

Published: 12 Nov 2025, Last Modified: 20 Nov 2025AIML-CEB 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Phylogenetic inference, Model selection, Deep learning, Maximum likelihood, OpenMP

TL;DR: ModelFinder-DL integrates deep-learning predictions with OpenMP-parallelized likelihood evaluation in IQ-TREE, achieving up to 5.3× faster phylogenetic model selection while maintaining comparable accuracy on large genomic datasets.

Abstract: Model selection is a fundamental step in phylogenetic inference, aiming to identify the evolutionary model that best fits a given multiple sequence alignment (MSA). Tools such as ModelFinder, implemented in the widely-used IQ-TREE software, rely on maximum-likelihood methods and information criteria (e.g, Akaike or Bayesian) to determine the optimal model. Recently, ModelRevelator introduced a deep neural network that predicts one of six commonly used substitution models and the Gamma-distributed rate heterogeneity model directly from the MSA. While ModelFinder is more accurate, it becomes computationally expensive for large MSAs, whereas ModelRevelator offers faster but less precise. Here, we present ModelFinder-DL, a hybrid model selection framework that integrates ModelRevelator with ModelFinder. In this framework, neural network predictions guide and constrain likelihood-based evaluation, thereby combining efficiency with robustness. To further enhance computational performance, we integrate OpenMP-based parallelism with ModelFinder-DL, enabling efficient multi-core utilization. In our experiments, ModelFinder-DL achieves up to a 1.4$\times$ speed-up over the single-threaded ModelFinder baseline. With four OpenMP threads, ModelFinder-DL attains a 5.3$\times$ speedup, compared to 3.8$\times$ for four-threaded ModelFinder. This demonstrates the first step towards how deep-learning-guided optimization, combined with OpenMP parallelization, can improve efficiency while maintaining high accuracy.

Submission Number: 4

Loading