mRNABench: A curated benchmark for mature mRNA property and function prediction

Published: 02 Mar 2026, Last Modified: 10 Mar 2026Gen² 2026 PosterEveryoneRevisionsCC BY 4.0
Track: Full / long paper (5-8 pages)
Keywords: RNA, genomics, foundation model, benchmark
TL;DR: We present mRNABench, a comprehensive benchmarking suite for mature mRNA biology that evaluates the representational quality of mature mRNA embeddings from self-supervised nucleotide foundation models.
Abstract: Messenger RNA (mRNA) is central to gene expression, and its half-life, localization, and translation efficiency drive phenotypic diversity in eukaryotic cells. While supervised learning has been used to study the mRNA regulatory code, self-supervised foundation models support a wider range of transfer learning tasks. However, the dearth of standardized benchmarks limits efforts to pinpoint the strengths of various models. Here, we present mRNABench, a benchmarking suite for mature mRNA biology, focused on human transcripts, that evaluates the representational quality of mature mRNA embeddings from self-supervised nucleotide foundation models. We curate 11 datasets and 79 prediction tasks that broadly capture salient properties of mature mRNA, and assess the performance of 24 families of nucleotide foundation models for a total of 259k experiments. Using these experiments, we study parameter scaling, correlations between sequence compressibility and performance, and data-splitting strategies. We identify synergies between two self-supervised learning objectives, and pre-train a new Mamba-based model that achieves state-of-the-art performance using ~ 700 times fewer parameters. mRNABench can be found at: https://github.com/morrislab/mRNABench.
Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.
Submission Number: 10
Loading