Keywords: Cross-species genomics;Mixture-of-Experts (MoE);Taxon-specific modeling;Genomic Discovery
TL;DR: GENE-M1 introduces a taxonomy-aligned Mixture-of-Experts architecture for robust cross-species genomic representation learning.
Abstract: Prevailing genomic foundation models rely on a uniform architecture across all species, which overlooks evolutionary divergence and leads to feature interference and limited cross-species generalization. To address this, we introduce GENE-M1, a novel Mixture-of-Experts (MoE) framework strictly governed by biological taxonomy. Our method builds on three core components: (1) a hierarchical expert architecture that instantiates specialized modules for taxonomic ranks (Domain, Kingdom, Phylum, Class) to enable taxon-specific processing; (2) a dynamic router that activates expert pathways aligned with a sequence’s taxonomy, ensuring hierarchical feature extraction; and (3) a progressive training strategy that transfers knowledge from higher to lower taxonomic ranks for stable optimization. In addition, we construct GM-DATA, a large-scale, taxonomically aligned benchmark comprising 294 species spanning 5 Kingdoms, 18 Phyla, and 62 Classes, with broad and balanced coverage across major clades, as well as a held-out GM-DATA(eval) set of 15 unseen species for rigorous cross-species evaluation. Extensive experiments on this benchmark show that GENE-M1 significantly outperforms state-of-the-art baselines in few-shot classification and unsupervised clustering, demonstrating that explicit taxonomic alignment is key to robust and interpretable genomic representation learning. We will release our model, code, and dataset soon.
Primary Area: other topics in machine learning (i.e., none of the above)
Submission Number: 8598
Loading