OpenMeta: A Comprehensive Multi-Task Benchmark for Metagenomics Understanding

23 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Metagenomics, DNA, Pre-Trained Language Model, Transformer, Benchmark
Abstract: Metagenomics is essential for exploring the vast diversity and intricate interactions of microbes that impact health, agriculture, and environmental sciences. Despite the surge of machine learning-based metagenomic models addressing these questions, evaluating their respective benefits is challenging due to the use of distinct, experimental datasets, partly contrived, and varying model performance across different tasks. To this end, we introduce OpenMeta, the first comprehensive benchmark tailored for metagenomic function prediction, which integrates diverse datasets ranging from 1,000 to 213,000 sequences and incorporates hierarchical data. We highlight the inadequacies of current genomic models and the superior performance of metagenomic pre-trained models for handling complex metagenomic data. Furthermore, we identify a critical research gap: the lack of unified models that process both sequence and hierarchical data. Addressing this could significantly advance metagenomic analyses. OpenMeta sets a new standard for metagenomic analysis, offering insights that could enhance the understanding and application of microbial ecology in biotechnology and environmental science.
Primary Area: datasets and benchmarks
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 2813
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview