Keywords: Foundation model, RNA, Drug design
Abstract: Originally marginalized as an intermediate in the information flow from DNA to
protein, RNA has become the star of modern biology, holding the key to precision
therapeutics, genetic engineering, evolutionary origins, and our understanding of
fundamental cellular processes. Yet RNA is as mysterious as it is prolific, serving
as an information store, a messenger, and a catalyst, spanning many undercharacterized
functional and structural classes. Deciphering the language of RNA is
important not only for a mechanistic understanding of its biological functions but
also for accelerating drug design. Toward this goal, we introduce AIDO.RNA, a
pre-trained module for RNA in an AI-driven Digital Organism [1]. AIDO.RNA
contains a scale of 1.6 billion parameters, trained on 42 million non-coding RNA
(ncRNA) sequences at single-nucleotide resolution, and it achieves state-of-theart
performance on a comprehensive set of tasks, including structure prediction,
genetic regulation, molecular function across species, and RNA sequence design.
AIDO.RNA after domain adaptation learns to model essential parts of protein translation
that protein language models, which have received widespread attention in
recent years, do not. More broadly, AIDO.RNA hints at the generality of biological
sequence modeling and the ability to leverage the central dogma to improve many
biomolecular representations. Models and code are available through ModelGenerator
in https://github.com/genbio-ai/AIDO and on Hugging Face.
Submission Number: 73
Loading