Submission Track: Short papers presenting ongoing research or work submitted to other venues (up to 5 pages, excluding references)
Keywords: state space models, foundation models, self-supervised learning, biodiversity science, genomics, fungi
Abstract: Accurate taxonomic classification from DNA barcodes is a cornerstone of global biodiversity monitoring, yet this remains challenging for fungi due to extreme label sparsity and long-tailed taxa distributions. Conventional supervised learning methods often falter in this domain, struggling with generalization to unseen species and to understand the hierarchical nature of the data. To address these limitations, we introduce BarcodeMamba+, a foundation model for fungal barcode classification built on a powerful and efficient state-space model architecture. Our approach centers on a pretrain and fine-tune paradigm, which utilizes partially labelled data and we demonstrate this is substantially more effective than traditional fully-supervised methods in this data-sparse environment. During fine-tuning, we systematically integrate and evaluate a suite of enhancements—including hierarchical label smoothing, a weighted loss function, and a multi-head output layer from MycoAI—to specifically tackle the challenges of fungal taxonomy. Our experiments show that each of these components yields significant performance gains. On the fungal classification benchmark, our final model outperforms a range of existing methods across all taxonomic levels. Our work not only provides a powerful new tool for genomics-based biodiversity research but also establishes an effective and scalable training paradigm for this challenging domain.
Submission Number: 19
Loading