EvoFlow-RNA: Generating and Representing non-coding RNA with a Language Model

Published: 05 Mar 2025, Last Modified: 16 Apr 2025ICLR 2025 AI4NA PosterEveryoneRevisionsBibTeXCC BY 4.0
Track: long paper (up to 6 pages)
Keywords: rna, aptamer, diffusion, mdm, llm, language, nucleic acid, design, de-novo, discrete
TL;DR: Masked diffusion models can reliably design functional RNAs from scratch, and improve existing ones in conditional generation tasks.
Abstract: RNA plays a critical role across numerous biological functions. Recent advances in language modeling show promise with representing RNA, but the possibility of large-scale RNA design and optimization has yet to be explored. We propose \textbf{EvoFlow-RNA}, a bidirectional RNA language model leveraging masked discrete diffusion models (MDMs) for both generative modeling and representation learning. EvoFlow-RNA bridges the gap between RNA sequence representation and design. It outperforms leading RNA models on six BEACON tasks, excelling in secondary structure prediction. For unconditional generation, it synthesizes diverse RNA sequences with native-like biophysical properties. Furthermore, EvoFlow-RNA can optimize aptamer sequences while preserving binding recognition sites. Our results demonstrate EvoFlow-RNA’s effectiveness in RNA modeling, highlighting the capability and potential of masked discrete diffusion for RNA design. Our code is available at https://github.com/AtomBio/evoflow-rna.
Submission Number: 15
Loading